influence-based network-oblivious - icdm 2013

Influence'based,Network'oblivious,,Community,Detec:on,

Nicola,Barbieri, ,Francesco,Bonchi,,,

Yahoo!,Labs,',Barcelona,,Spain,,{barbieri,bonchi}@yahoo'inc.com ,,

,

Giuseppe,Manco,,

ICAR'CNR,,',Rende,,Italy,[email protected],

The,task,of,detec:ng,close&knit+communi.es,of,like&minded,people,in,on'line,social,networks,has,plenty,of,applica:ons,in,

marke:ng,and,personaliza:on.,

If,a,user,responded,posi:vely,to,a,certain,campaign:,

TARGET&USERS&IN&THE&SAME&COMMUNITY.&

By,homophily,one,can,expect,similar,users,to,be,more,likely,to,be,interested,in,the,same,product,than,random,users.,

If,more,users,in,the,same,community,adopt,the,same,product,,this,might,eventually,create,a,word&of&mouth+buzz.,

How&can&we&detect&communi<es&when&the&social&graph&is&not&available?&

The,companies,that,would,mostly,benefit,from,knowing,the,structure,of,the,social,network,oSen,do+not+have+access+to+

the+network!,

A,company,adver:sing,or,developing,applica:ons,over,an,on'line,social,network,owns,the,log+of+user+ac.vity,that,it,produces.,

Exploit,the,phenomenon,of,social+contagion+to+detect+communi.es.,

Influence'driven,informa:on,propaga:on.,

Users,performs,ac:ons,(likes,,purchases,,shares,,tweets),and,those,ac:ons,propagate,across,the,network.,

A,Propaga.on+model,governs,how,influence,propagates,across,a,network.,

Independent+Cascade+Model:++When,a,node,(v),become,ac:ve,it,is,considered,

contagious,and,it,has,a,single,chance,to,ac:vate,each,inac:ve,neighbor,(u),with,

probability,pv,u.,

As,informa:on,spreads,over,social,connec:ons,,,the,network,naturally,shapes,the,process,of,informa:on,diffusion.,,

Stochas:c,Framework,for,Network,Oblivious,CD.,

We,assume,that,user,ac:vi:es,are,governed,by,an,underlying,stochas.c+diffusion+process,over,the,unobserved,social,network.,

influence exerted by the other members of the communityon u for adopting i.

• By fitting the model parameters to the user activity logD, we learn the community membership and influencelevels.

This general framework can be instantiated to differentcommunity-level influence diffusion models, thus obtainingdifferent methods. We will introduce two such models inSection ?? and ?? respectively. We conclude this sectionby presenting the EM-like algorithm for fitting the modelparameters to the user activity log.Modeling maximum likelihood. We assume that each prop-agation trace is independent from the others, and we adopta maximum a-posteriori perspective. That is, we hypothesizethat action probabilities adhere to a mathematical modelgoverned by a set of parameters ⇥. The likelihood of the datagiven the model parameters ⇥, can hence be expressed as:

L(⇥;D) =Y

u2V

P (u|⇥)

where P (u|⇥) represents the likelihood to observe u’s behav-ior relative to D. As a consequence, the corresponding learningproblem is finding the optimal ˆ

⇥ that maximizes L(⇥;D).Following the standard mixture modeling approach [?], we

assume that users’ actions can only happen relative to acommunity of membership. That is, we assume that a hiddenbinary variable z

u,k

denotes the membership of user u tocommunity k, with the constraints

PK

k=1

zu,k

= 1. Thus,⇥ can be partitioned into {⇡

1

, . . . ,⇡K

,⇥1

, . . . ,⇥K

}, where⇥

k

represents the parameter set relative to community k, and⇡k

= P (zu,k

= 1). We can rewrite the likelihood as

L(⇥;D) =Y

u

KX

k=1

P (u|⇥k

)⇡k

,

which can be optimized by resorting to the traditional EMalgorithm: we rewrite the complete-likelihood as

P (D,Z,⇥) = P (D|Z,⇥) · P (Z|⇥) · P (⇥) (1)

where

P (D|Z,⇥) =

Y

u2V

KY

k=1

P (u|⇥k

)

zu,k

P (Z|⇥) =

Y

u2V

KY

k=1

⇡zu,k

k

,

and P (⇥) represents the prior relative to the parameter set ⇥.Inspired by [?], we choose to model the latter as

P (⇥) /KY

k=1

⇡� 1

2

p|⇥k|

k

with the interpretation that, for fixed K, the parameters ⇡k

allow an “improper” Dirichlet-type prior. The advantage is thatthis enables a formulation of EM algorithm which allows the

automatic detection of the optimal number K of communi-ties. By standard manipulation of Eq. ??, the Complete-Data

Expectation Likelihood [?] is given by:

Q(⇥;⇥

0) =E[logP (D,Z,⇥)|D;⇥0

]

/X

u2V

KX

k=1

�u,k

{logP (u|⇥k

) + log ⇡k

}

�KX

k=1

Nk

2

log ⇡k

(2)

where Nk

=

p|⇥

k

|, and �u,k

⌘ P (zu,k

= 1|u,⇥0).

Optimizing Q(⇥;⇥

0) with respect to ⇡

k

under the constraintsPk

⇡k

= 1, 0 ⇡k

1 yields

⇡k

=

max

�0,P

u2V

�u,k

�Nk

/2

PK

k=1

max

�0,P

u2V

�u,k

�Nk

/2 (3)

Here, the proposed prior allows an adjustment to the esti-mation of the ⇡

k

parameters which enables “annihilation”: acommunity not supported by a sufficient number of users issuppressed. Thus, we can start with an arbitrary large initialnumber of communities, and then infer the final number K byletting some of the mixing probabilities ⇡

k

be zero.The general EM scheme is shown in Alg. ??. Notice that,

as explained in [?], a further advantage of the scheme isits robustness to random initialization: by starting with anarbitrarily large number of components, we can avoid thepitfalls of local maxima, since the whole parameter space islikely to be covered. As a side note, the modeling of the priorP (⇥) is a major difference w.r.t. [?]: when |⇥

k

| is of thesame order of magnitude as |V |, the original formulation of theprior in [?] would produce an underestimation of the numberof communites. Furthermore, reducing the weight of |⇥

k

| inthe computation of ⇡

k

allows us to reformulate the algorithmwithout optimizing the components in sequence, which wouldrequire a prohibitive computational cost for large D.

The above modeling is a general framework, which is para-metric to the component P (u|⇥

k

). In turn, the latter dependson the way we model the probability P (a) for given actionsa ⌘ (u, i, t). We explore two different ways of modeling theprobability P (a), which focus on two different perspectives.

1) The probability that u adopts i is the result of abernoullian process on i, i.e., P (a) ⌘ P (i|u, t), andtime proceeds in discrete steps.

2) The final model does not consider whether a user adoptsi, but when the adoption happens, i.e. P (a) ⌘ P (t|i, u).

We next explore each strategy in turn. We consider only binaryactivations: at a given timestamp, each user is either active orinactive, and an active user cannot become inactive again.

IV. COMMUNITY-LEVEL INDEPENDENT CASCADE MODEL

In the first alternative, we assume a bernoullian model forusers’ adoptions of items. As a result, the likelihood P (u|⇥

k

)

can be specified over the observed binary data Yi,u

, whereYi,u

= 1 if u 2 Ci

, and Yi,u

= 0 otherwise.

tion model and a set of nodes S ✓ V , the expected numberof nodes “infected” in the viral cascade started with S, iscalled (expected) spread of S and denoted by �(S). The in-fluence maximization problem asks for a set S ✓ V , |S| = k,such that �(S) is maximum, where k is an input parameter.

The most studied propagation model is the so called Inde-pendent Cascade (IC) model. We are given a directed socialgraph G = (V,A) with arcs (u, v) 2 A labeled by influenceprobability p

u,v

2 (0, 1], representing the strength of the in-fluence of u over v. If (u, v) 62 A, we define p

u,v

= 0. Ata given time step, each node is either active (an adopterof product) or inactive. At time 0, a set S of seeds areactivated. Time unfolds deterministically in discrete steps.As time unfolds, more and more of neighbors of an inactivenode v become active, eventually making v become active,and v’s decision may in turn trigger further decisions bythe nodes to which v is connected. In particular, in the ICmodel, when a node u first becomes active, say at time t, ithas one chance at influencing each inactive neighbor v withprobability p

u,v

, independently of the history thus far. Ifthe attempt succeeds, v becomes active at time t+ 1.

Influence maximization is generally NP-hard [?], underIC and other propagation models. Kempe et al., however,show that the function �(S) is monotone1 and submodular2.When equipped with such properties, the simple greedy al-gorithm that at each iteration greedily extends the set ofseeds with the node providing the largest marginal gain,produces a solution with provable approximation guarantee(1� 1/e) [?].Though simple, the greedy algorithm is computationally

prohibitive, since the step of selecting the node provid-ing the largest marginal gain is #P-hard. In their paper,Kempe et al. run Monte Carlo simulations for su�cientlymany times to obtain an accurate estimate of the expectedspread.However, running many propagation simulations isextremely costly on very large real-world social networks.Therefore, following [?], considerable e↵ort has been devotedto develop methods for improving the e�ciency and scalabil-ity of influence maximization [?, ?, ?, ?]. Regardless of suchresearch e↵orts, scalability still remains an open challenge:on the problem instances that we consider in this paper, evena state-of-the-art algorithm as CELF++ [?], takes from fewdays to more than a week in order to extract a seed set of50 nodes. Most of this literature on e�cient algorithms forinfluence maximization assumes the weighted social graphgiven, and do not address how the link influence probabili-ties p

u,v

can be obtained. This problem instead is addressedin [?, ?, ?].Regardless of the fact that users authoritativeness, exper-

tise, trust and influence are evidently topic-dependent, onlyfew papers have looked at social influence from the topicsperspective. Tang et al. [?] study the problem of learn-ing user-to-user topic-wise influence strength. The input totheir problem is the social network and a prior topic distri-bution for each node, which is given as input and inferredseparately. Liu et al. [?] propose a probabilistic model forthe joint inference of the topic distribution and topic-wiseinfluence strength: here the input is an heterogenous so-cial network with nodes that are users and documents. Thegoal is to learn users’ interest (topic distribution) and user-to-user influence. Lin et al. [?] study the joint modelingof influence and topics, by adopting textual models. None1�(S) �(T ) whenever S ✓ T .2�(S [ {w})� �(S) � �(T [ {w})� �(T ) whenever S ✓ T .

Figure 1: Topic-aware influence parameters are learnt fromthe log of past propagations and the social network following[?]. These are the prerequisites to build the INFLEX indexthat we use to e�ciently answer TIM queries.

of these three papers define an influence propagation model,instead in a recent work Barbieri et al. [?] extend the classicIC model to be topic-aware: the resulting model is namedTopic-aware Independent Cascade (TIC) .Barbieri et al. also devise methods to learn, from a log of

past propagations, the model parameters, i.e., topic-awareinfluence strength for each link and topic-distribution foreach item. Their experiments show that (1) topic-aware in-fluence propagation models are more accurate in describingreal-world influence driven propagations than the state-of-the-art topic-blind models, and (2) by considering the char-acteristics of the item, a larger number of adoptions can beobtained in the influence maximization problem. The TICmodel is assumed at the basis of our work and it is intro-duced in details next.

1.2 Problem definitionIn this paper we study indexing schemes for answering

Topic-aware Influence Maximization (TIM) queries. We aregiven a directed social graph G = (V,A) and a space of Ztopics. We assume the TIC propagation model introducedin [?], whose parameters are learned from a log of past prop-agation traces. In particular for each arc (u, v) 2 A and foreach topic z 2 [1, Z] we have a probability pz

u,v

representingthe strength of influence that user u exerts over user v fortopic z. A high level depiction of our setting is provided inFigure ??. An item i is described by a distribution ~�

i

overthe topics: that is for each topic z 2 [1, Z] we are given �z

i

,with

PZ

z=1 �z

i

= 1. In the TIC model a propagation happenslike in the IC model: when a node u first becomes active onitem i, has one chance of influencing each inactive neighborv, independently of the history thus far. The tentative suc-ceeds with a probability that is the weighted average of thelink probability w.r.t. the topic distribution of the item i:

piu,v

=ZX

z=1

�z

i

pzu,v

. (1)

A TIM query Q(~�q

, k), takes as input an item description~�q

and an integer k and it requires to find the seed set S ✓ V ,|S| = k, such that the expected number of nodes adoptingitem q, denoted by �(S,~�

q

), is maximum:

Q(~�q

, k) = argmaxS✓V,|S|=k

�(S,~�q

). (2)

Propaga'on)Log) (Unobserved))Social)Network)

Communi'es)

Our,framework,assumes,the,existence,of,an,unobserved+

social+network,having,a,modular+structure.,

Each,user,is,associated,with,a,level,of,membership,and,influence+in,each,community.,

We,can,model,the,behavior,of,users,by,exploi:ng,the,standard,mixture+modeling+approach:,

Community'Independent,Cascade,(C'IC).,

Generaliza:on,of,the,IC,Model:,each,new,ac:ve,user,v,exerts,her,influence,globally,,with,a,strength,that,depends,on,the,community+

k+of+the+targeted+node.,,

• Each user is associated with a level of membership anda level of influence in each community. These are theparameters of the diffusion model. The adoption of anitem i by a user u depends on the influence exerted byother members of the community on u for adopting i.

• By fitting the model parameters to the activity log D, welearn the community membership and influence levels.

This general framework can be instantiated to differentcommunity-level influence diffusion models. We will intro-duce two such models in Section IV and V respectively. Weconclude this section by presenting the EM-like algorithm forfitting the model parameters to the user activity log.Modeling the likelihood. We assume that each propagationtrace is independent from the others, and we adopt a maximuma-posteriori perspective. That is, we hypothesize that actionprobabilities adhere to a mathematical model governed by aset of parameters ⇥. Following the standard mixture modelingapproach [11], we assume that users’ actions can only happenrelative to a community of membership. That is, we assumethat a hidden binary variable z

u,k

denotes the membership ofuser u to community k, with the constraints

PK

k=1 zu,k = 1.Thus, ⇥ can be partitioned into {⇡1, . . . ,⇡K

,⇥1, . . . ,⇥K

},where ⇥

k

represents the parameter set relative to communityk, and ⇡

k

⌘ P (zu,k

= 1). We can express the likelihoodof the data as L(⇥;D) =

Qu

PK

k=1 P (u|⇥k

)⇡k

, which canbe optimized by resorting to the traditional EM algorithm:Consider the complete-likelihood

P (D,Z,⇥) = P (D|Z,⇥) · P (Z|⇥) · P (⇥) (1)

where

P (D|Z,⇥) =

Y

u2V

KY

k=1

P (u|⇥k

)

zu,k , P (Z|⇥) =

Y

u2V

KY

k=1

⇡zu,k

k

,

and P (⇥) represents the prior relative to the parameter set⇥. By standard manipulation of Eq. 1, the Complete-Data

Expectation Likelihood [11] is given by:Q(⇥;⇥

0) = EZ[logP (D,Z,⇥)|D;⇥0

]

=

X

u2V

KX

k=1

�u,k

{logP (u|⇥k

) + log ⇡k

}+ logP (⇥)

(2)

where �u,k

⌘ P (zu,k

= 1|u,⇥0).

Optimizing the latter can be done by means of EM al-gorithm: starting with an initial random assignment ⇥, thealgorithm iteratively performs two steps until convergence:

• (E Step) Given ⇥, estimate �u,k

for each u, k as

�u,k

=

P (u|⇥k

)⇡kP

k

k=1 P (u|⇥k

)⇡k

• (M step) Given �u,k

, find the⇥ maximizing Eq. 2.This general scheme is parametric to both the prior P (⇥)

and the component P (u|⇥k

). We model the former in a waysimilar to [12], in order to allow an automatic estimation ofthe optimar number K of communities. As for the latter, itdepends on the way we model the probability P (a) for givenactions a ⌘ (u, i, t). We explore two different alternatives.





When the social relationships are explicit, it is possible todefine a propagation model which describes how adoptionsspread across the network [5] and to model information propa-gation and community structure suitably [10]. In these models,a users tendency to become active increases monotonically asmore of its social peers become active. We next adapt thisconcept to a network-oblivious situation, where we assumethat the user’s tendency to become active depends on theinfluence exterted within the community of membership.

The Community-Independent Cascade (C-IC) model drawsfrom the Independent Cascade model (IC) [5], and models theidea that each user exerts the same degree influence over mem-bers of each community. Time unfolds in discrete timestamps.As in IC, when a user v becomes active, say at time t, it isconsidered contagious and has a single chance of influencingeach inactive neighbor u, independently of the history thusfar. We assume that v exerts her influence “globally”, with astrength pk

v

2 [0, 1] which depends on the community k of thetargeted node. The idea is that the community-level influenceof each user v is higher in the community she belongs to.According to this principle, we assume that information mainlypropagate locally and spread across communities thanks tothe presence of users who exhibit high degree of “external”influence.

Following [13] we adopt a delay threshold � to defineinfluencers. Specifically, we define F+

i,u

= {v 2 V |0 tu

(i) � tv

(i) �} as the set of users who potentiallyinfluenced u in the adoption of i. Similarly we define the setF�

i,u

= {v 2 V |tu

(i) � tv

(i) > �} of users who definitelyfailed in influencing u over i. Then, we can specify P (u|⇥

k

)

asP (u|⇥

k

) =

Y

i

P+(i|u,⇥k

) · P�(i|u,⇥k

), (3)

where P+(i|u,⇥k

) represents the probability that some of thepotential influencers activated u and P�(i|u,⇥k

) the proba-bility that none of the “out-of-react” influencers succeeded:

P+(i|u,⇥k

) = 1�Y

v2F

+i,u

(1� pkv

)

P�(i|u,⇥k

) =

Y

v2F

�i,u

(1� pkv

)

We can then specify the complete-data likelihood through:

P (D|Z,⇥) =Y

i,u,k

2

641�Y

v2F+i,u

(1� pkv)

3

75

zu,k

·

2

64Y

v2F�i,u

(1� pkv)

3

75

zu,k

,Probability,that,some,of,the,poten:al,influencers,in,ac:va:ng,u,




u,k


PK


,⇥1, . . . ,⇥K

},where ⇥

k


k

⌘ P (zu,k


Qu

PK

k=1 P (u|⇥k

)⇡k


P (D,Z,⇥) = P (D|Z,⇥) · P (Z|⇥) · P (⇥) (1)

where

P (D|Z,⇥) =

Y

u2V

KY

k=1

P (u|⇥k

)

zu,k , P (Z|⇥) =

Y

u2V

KY

k=1

⇡zu,k

k

,



0) = EZ[logP (D,Z,⇥)|D;⇥0

]

=

X

u2V

KX

k=1

�u,k

{logP (u|⇥k

) + log ⇡k

}+ logP (⇥)

(2)

where �u,k

⌘ P (zu,k

= 1|u,⇥0).



for each u, k as

�u,k

=

P (u|⇥k

)⇡kP

k

k=1 P (u|⇥k

)⇡k











v



i,u

= {v 2 V |0 tu

(i) � tv


i,u

= {v 2 V |tu

(i) � tv


k

)

asP (u|⇥

k

) =

Y

i

P+(i|u,⇥k

) · P�(i|u,⇥k

), (3)

where P+(i|u,⇥k



P+(i|u,⇥k

) = 1�Y

v2F

+i,u

(1� pkv

)

P�(i|u,⇥k

) =

Y

v2F

�i,u

(1� pkv

)


P (D|Z,⇥) =Y

i,u,k

2

641�Y

v2F+i,u

(1� pkv)

3

75

zu,k

·

2

64Y

v2F�i,u

(1� pkv)

3

75

zu,k




u,k


PK


,⇥1, . . . ,⇥K

},where ⇥

k


k

⌘ P (zu,k


Qu

PK

k=1 P (u|⇥k

)⇡k


P (D,Z,⇥) = P (D|Z,⇥) · P (Z|⇥) · P (⇥) (1)

where

P (D|Z,⇥) =

Y

u2V

KY

k=1

P (u|⇥k

)

zu,k , P (Z|⇥) =

Y

u2V

KY

k=1

⇡zu,k

k

,



0) = EZ[logP (D,Z,⇥)|D;⇥0

]

=

X

u2V

KX

k=1

�u,k

{logP (u|⇥k

) + log ⇡k

}+ logP (⇥)

(2)

where �u,k

⌘ P (zu,k

= 1|u,⇥0).



for each u, k as

�u,k

=

P (u|⇥k

)⇡kP

k

k=1 P (u|⇥k

)⇡k











v



i,u

= {v 2 V |0 tu

(i) � tv


i,u

= {v 2 V |tu

(i) � tv


k

)

asP (u|⇥

k

) =

Y

i

P+(i|u,⇥k

) · P�(i|u,⇥k

), (3)

where P+(i|u,⇥k



P+(i|u,⇥k

) = 1�Y

v2F

+i,u

(1� pkv

)

P�(i|u,⇥k

) =

Y

v2F

�i,u

(1� pkv

)


P (D|Z,⇥) =Y

i,u,k

2

641�Y

v2F+i,u

(1� pkv)

3

75

zu,k

·

2

64Y

v2F�i,u

(1� pkv)

3

75

zu,k

,Probability,that,none,of,the,the,“out'of'react”,influencers,succeeds,in,ac:va:ng,u,

Learning influence weights. The analytical optimization ofQ(⇥;⇥

0) is still difficult. We resort to the explicit modeling

of the influencers as hidden data to simplify the optimizationprocedure. That is, let w

i,u,v

be a binary variable such thatw

i,u,v

= 1 if v triggered the adoption of the item i by u, andlet W denote the set of all possible w

i,u,v

such that v 2 F+i,u

.Then, we can rewrite the complete-data likelihood relative toW as

P (D,Z,W,⇥) = P (D,W|⇥,Z) · P (Z|⇥) · P (⇥),

whereP (D,W|⇥,Z) =

Y

i,u,k

Y

v2F

�i,u

(1� pkv

)

zu,k

·Y

i,u,k

Y

v2F

+i,u

�pkv

�wi,u,v·zu,k �

1� pkv

�(1�wi,u,v)·zu,k

As a consequence, the contribution to Q(⇥;⇥

0) in the second

row of Eq. 2 can be rewritten as

X

u

X

k

�u,k

0

B@log ⇡k

+

X

i

X

v2F

�i,u

log(1� pkv

)

+

X

i

X

v2F

+i,u

⌘i,u,v,k

log pkv

+ (1� ⌘i,u,v,k

) log(1� pkk

)

1

CA

where ⌘i,u,v,k

is the “responsibility” of the user v in triggeringu’s adoption in the context of the community k:

⌘i,u,v,k

= P (wi,u,v

= 1|u, i, zu,k

= 1,⇥0)

=

pkv

1�Q

w2F

+i,u

(1� pkw

)

.

Finally, optimizing Q(⇥;⇥

0) with respect to pk

v

yields

pkv

=

Phu,ii

v2F+i,u

�u,k

· ⌘i,u,v,k

S+v,k

+ S�v,k

, (4)

with S+v,k

=

Phu,ii

v2F+i,u

�u,k

and S�v,k

=

Phu,ii

v2F�i,u

�u,k

.

V. MODELING TEMPORAL DYNAMICS

C-IC does not explicitly model temporal dynamics, as itfocuses on modeling just binary activations by employing adiscrete-time propagation model. Here we present an alter-native modeling that exploits time delays to characterize theoverall diffusion process.

Given an observation window [0, T ], the idea is to explicitlymodel the likelihood of the time at which each user adoptedeach item, or the likelihood that the considered adoptiondid not happen within time T . This approach assumes thatthere is a dependency between the adoption time of theinfluencer and the one of the influenced. In NetRate [4],previously described in Sec. II, this dependency in modeled bya conditional likelihood f(t

u

|tv

,↵v,u

) of transmission, whichdepends on the delay �

v,u

. The likelihood of a propagationcan be formulated by applying standard survival analysis [14],in terms of survival S(t

u

|tv

,↵v,u

) (modeling the probability

that a user survives uninfected at least until time tu

) and hazardfunctions H(t

u

|tv

,↵v,u

) (modeling instantaneous infections).We reformulate this framework into a community-based

scenario. The Community-Rate (C-Rate) propagation modelis characterized by the following assumptions:

• User’s influence is limited to the community she belongsto. That is, the user is likely to influence/be influencedby members of the same community, while the effectof influence is marginal on members of a differentcommunity.

• Information diffusion from the user v to v within the k-thcommunity is characterized by the density f(t

u

|tv

,↵v,k

),where ↵

v,k

is related to the expected delay on the activa-tions that v triggers within community k. The probabilityof contagion depends on the time delay �

v.u

.The parameter ↵

v,k

has a direct interpretation in terms ofinfluence: high values of ↵

v,k

cause short delays, and as aconsequence denote v as strongly influential within k.

On the basis of the above observations, we can adapt theNetRate model to fit the scheme of Sec. III, by plugging

P (u|⇥k

) =

Y

i:u 62Ci

Y

v2Ci

S(T |tv

(i),↵v,k

)·

Y

i:u2Ci

Y

v2Ci,tu(i)

S(tu

(i)|tv

(i),↵v,k

)

X

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

(5)

Learning. Again, instead of directly optimizing the abovelikelihood, we introduce the latent binary variable w

i,u,v

denoting the fact that u has been infected by v on i. Then, thelikelihood can be rewritten by defining

P (D,W|Z,⇥) =

Y

hu,ii62D

Y

k

Y

v2Ci

S(T |tv

(i),↵v,k

)

zu,k

·Y

hu,ii2D

Y

k

Y

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

wi,u,vzu,k

· S(tu

(i)|tv

(i),↵v,k

)

zu,k

and replacing P (D|Z,⇥) with the above component in thelikelihood. In the following we adopt the exponential distribu-tion f(t

u

|tv

,↵v,k

) = ↵v,k

exp {�↵v,k

�

v,u

}, which enablesS(t

u

|tv

,↵v,k

) = exp {�↵v,k

�

v,u

} and H(tu

|tv

,↵v,k

) =

↵v,k

. 1 Then,

Q(⇥;⇥

0) /

X

u,k

�u,k

log ⇡k

�X

hu,ii62D

X

k

X

v2Ci

�u,k

�

v

↵v,k

+

X

hu,ii2D

X

k

X

v2Ci,tu(i)

⌘i,u,v,k

�u,k

log↵v,k

�X

hu,ii2D

X

k

X

v2Ci,tu(i)

�u,k

�

u,v

↵v,k

,

1Similar formulations can be obtained by adopting different densities andare omitted here for lack of space.




i,u,v


i,u,v


i,u,v

such that v 2 F+i,u


P (D,Z,W,⇥) = P (D,W|⇥,Z) · P (Z|⇥) · P (⇥),


Y

i,u,k

Y

v2F

�i,u

(1� pkv

)

zu,k

·Y

i,u,k

Y

v2F

+i,u

�pkv

�wi,u,v·zu,k �

1� pkv



0) in the second


X

u

X

k

�u,k

0

B@log ⇡k

+

X

i

X

v2F

�i,u

log(1� pkv

)

+

X

i

X

v2F

+i,u

⌘i,u,v,k

log pkv

+ (1� ⌘i,u,v,k

) log(1� pkk

)

1

CA

where ⌘i,u,v,k


⌘i,u,v,k

= P (wi,u,v

= 1|u, i, zu,k

= 1,⇥0)

=

pkv

1�Q

w2F

+i,u

(1� pkw

)

.



v

yields

pkv

=

Phu,ii

v2F+i,u

�u,k

· ⌘i,u,v,k

S+v,k

+ S�v,k

, (4)

with S+v,k

=

Phu,ii

v2F+i,u

�u,k

and S�v,k

=

Phu,ii

v2F�i,u

�u,k

.




u

|tv

,↵v,u


v,u


u

|tv

,↵v,u




u

|tv

,↵v,u





u

|tv

,↵v,k

),where ↵

v,k


v.u

.The parameter ↵

v,k


v,k



P (u|⇥k

) =

Y

i:u 62Ci

Y

v2Ci

S(T |tv

(i),↵v,k

)·

Y

i:u2Ci

Y

v2Ci,tu(i)

S(tu

(i)|tv

(i),↵v,k

)

X

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

(5)


i,u,v


P (D,W|Z,⇥) =

Y

hu,ii62D

Y

k

Y

v2Ci

S(T |tv

(i),↵v,k

)

zu,k

·Y

hu,ii2D

Y

k

Y

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

wi,u,vzu,k

· S(tu

(i)|tv

(i),↵v,k

)

zu,k


u

|tv

,↵v,k

) = ↵v,k

exp {�↵v,k

�

v,u

}, which enablesS(t

u

|tv

,↵v,k

) = exp {�↵v,k

�

v,u

} and H(tu

|tv

,↵v,k

) =

↵v,k

. 1 Then,

Q(⇥;⇥

0) /

X

u,k

�u,k

log ⇡k

�X

hu,ii62D

X

k

X

v2Ci

�u,k

�

v

↵v,k

+

X

hu,ii2D

X

k

X

v2Ci,tu(i)

⌘i,u,v,k

�u,k

log↵v,k

�X

hu,ii2D

X

k

X

v2Ci,tu(i)

�u,k

�

u,v

↵v,k

,





i,u,v


i,u,v


i,u,v

such that v 2 F+i,u


P (D,Z,W,⇥) = P (D,W|⇥,Z) · P (Z|⇥) · P (⇥),


Y

i,u,k

Y

v2F

�i,u

(1� pkv

)

zu,k

·Y

i,u,k

Y

v2F

+i,u

�pkv

�wi,u,v·zu,k �

1� pkv



0) in the second


X

u

X

k

�u,k

0

B@log ⇡k

+

X

i

X

v2F

�i,u

log(1� pkv

)

+

X

i

X

v2F

+i,u

⌘i,u,v,k

log pkv

+ (1� ⌘i,u,v,k

) log(1� pkk

)

1

CA

where ⌘i,u,v,k


⌘i,u,v,k

= P (wi,u,v

= 1|u, i, zu,k

= 1,⇥0)

=

pkv

1�Q

w2F

+i,u

(1� pkw

)

.



v

yields

pkv

=

Phu,ii

v2F+i,u

�u,k

· ⌘i,u,v,k

S+v,k

+ S�v,k

, (4)

with S+v,k

=

Phu,ii

v2F+i,u

�u,k

and S�v,k

=

Phu,ii

v2F�i,u

�u,k

.




u

|tv

,↵v,u


v,u


u

|tv

,↵v,u




u

|tv

,↵v,u





u

|tv

,↵v,k

),where ↵

v,k


v.u

.The parameter ↵

v,k


v,k



P (u|⇥k

) =

Y

i:u 62Ci

Y

v2Ci

S(T |tv

(i),↵v,k

)·

Y

i:u2Ci

Y

v2Ci,tu(i)

S(tu

(i)|tv

(i),↵v,k

)

X

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

(5)


i,u,v


P (D,W|Z,⇥) =

Y

hu,ii62D

Y

k

Y

v2Ci

S(T |tv

(i),↵v,k

)

zu,k

·Y

hu,ii2D

Y

k

Y

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

wi,u,vzu,k

· S(tu

(i)|tv

(i),↵v,k

)

zu,k


u

|tv

,↵v,k

) = ↵v,k

exp {�↵v,k

�

v,u

}, which enablesS(t

u

|tv

,↵v,k

) = exp {�↵v,k

�

v,u

} and H(tu

|tv

,↵v,k

) =

↵v,k

. 1 Then,

Q(⇥;⇥

0) /

X

u,k

�u,k

log ⇡k

�X

hu,ii62D

X

k

X

v2Ci

�u,k

�

v

↵v,k

+

X

hu,ii2D

X

k

X

v2Ci,tu(i)

⌘i,u,v,k

�u,k

log↵v,k

�X

hu,ii2D

X

k

X

v2Ci,tu(i)

�u,k

�

u,v

↵v,k

,





i,u,v


i,u,v


i,u,v

such that v 2 F+i,u


P (D,Z,W,⇥) = P (D,W|⇥,Z) · P (Z|⇥) · P (⇥),


Y

i,u,k

Y

v2F

�i,u

(1� pkv

)

zu,k

·Y

i,u,k

Y

v2F

+i,u

�pkv

�wi,u,v·zu,k �

1� pkv



0) in the second


X

u

X

k

�u,k

0

B@log ⇡k

+

X

i

X

v2F

�i,u

log(1� pkv

)

+

X

i

X

v2F

+i,u

⌘i,u,v,k

log pkv

+ (1� ⌘i,u,v,k

) log(1� pkk

)

1

CA

where ⌘i,u,v,k


⌘i,u,v,k

= P (wi,u,v

= 1|u, i, zu,k

= 1,⇥0)

=

pkv

1�Q

w2F

+i,u

(1� pkw

)

.



v

yields

pkv

=

Phu,ii

v2F+i,u

�u,k

· ⌘i,u,v,k

S+v,k

+ S�v,k

, (4)

with S+v,k

=

Phu,ii

v2F+i,u

�u,k

and S�v,k

=

Phu,ii

v2F�i,u

�u,k

.




u

|tv

,↵v,u


v,u


u

|tv

,↵v,u




u

|tv

,↵v,u





u

|tv

,↵v,k

),where ↵

v,k


v.u

.The parameter ↵

v,k


v,k



P (u|⇥k

) =

Y

i:u 62Ci

Y

v2Ci

S(T |tv

(i),↵v,k

)·

Y

i:u2Ci

Y

v2Ci,tu(i)

S(tu

(i)|tv

(i),↵v,k

)

X

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

(5)


i,u,v


P (D,W|Z,⇥) =

Y

hu,ii62D

Y

k

Y

v2Ci

S(T |tv

(i),↵v,k

)

zu,k

·Y

hu,ii2D

Y

k

Y

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

wi,u,vzu,k

· S(tu

(i)|tv

(i),↵v,k

)

zu,k


u

|tv

,↵v,k

) = ↵v,k

exp {�↵v,k

�

v,u

}, which enablesS(t

u

|tv

,↵v,k

) = exp {�↵v,k

�

v,u

} and H(tu

|tv

,↵v,k

) =

↵v,k

. 1 Then,

Q(⇥;⇥

0) /

X

u,k

�u,k

log ⇡k

�X

hu,ii62D

X

k

X

v2Ci

�u,k

�

v

↵v,k

+

X

hu,ii2D

X

k

X

v2Ci,tu(i)

⌘i,u,v,k

�u,k

log↵v,k

�X

hu,ii2D

X

k

X

v2Ci,tu(i)

�u,k

�

u,v

↵v,k

,


Expecta:on'Maximiza:on,algorithm,to,determine,the,parameters,that,maximize,

Modeling,temporal,dynamics,with,C'Rate.,




i,u,v


i,u,v


i,u,v

such that v 2 F+i,u


P (D,Z,W,⇥) = P (D,W|⇥,Z) · P (Z|⇥) · P (⇥),


Y

i,u,k

Y

v2F

�i,u

(1� pkv

)

zu,k

·Y

i,u,k

Y

v2F

+i,u

�pkv

�wi,u,v·zu,k �

1� pkv



0) in the second


X

u

X

k

�u,k

0

B@log ⇡k

+

X

i

X

v2F

�i,u

log(1� pkv

)

+

X

i

X

v2F

+i,u

⌘i,u,v,k

log pkv

+ (1� ⌘i,u,v,k

) log(1� pkk

)

1

CA

where ⌘i,u,v,k


⌘i,u,v,k

= P (wi,u,v

= 1|u, i, zu,k

= 1,⇥0)

=

pkv

1�Q

w2F

+i,u

(1� pkw

)

.



v

yields

pkv

=

Phu,ii

v2F+i,u

�u,k

· ⌘i,u,v,k

S+v,k

+ S�v,k

, (4)

with S+v,k

=

Phu,ii

v2F+i,u

�u,k

and S�v,k

=

Phu,ii

v2F�i,u

�u,k

.




u

|tv

,↵v,u


v,u


u

|tv

,↵v,u




u

|tv

,↵v,u





u

|tv

,↵v,k

),where ↵

v,k


v.u

.The parameter ↵

v,k


v,k



P (u|⇥k

) =

Y

i:u 62Ci

Y

v2Ci

S(T |tv

(i),↵v,k

)·

Y

i:u2Ci

Y

v2Ci,tu(i)

S(tu

(i)|tv

(i),↵v,k

)

X

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

(5)


i,u,v


P (D,W|Z,⇥) =

Y

hu,ii62D

Y

k

Y

v2Ci

S(T |tv

(i),↵v,k

)

zu,k

·Y

hu,ii2D

Y

k

Y

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

wi,u,vzu,k

· S(tu

(i)|tv

(i),↵v,k

)

zu,k


u

|tv

,↵v,k

) = ↵v,k

exp {�↵v,k

�

v,u

}, which enablesS(t

u

|tv

,↵v,k

) = exp {�↵v,k

�

v,u

} and H(tu

|tv

,↵v,k

) =

↵v,k

. 1 Then,

Q(⇥;⇥

0) /

X

u,k

�u,k

log ⇡k

�X

hu,ii62D

X

k

X

v2Ci

�u,k

�

v

↵v,k

+

X

hu,ii2D

X

k

X

v2Ci,tu(i)

⌘i,u,v,k

�u,k

log↵v,k

�X

hu,ii2D

X

k

X

v2Ci,tu(i)

�u,k

�

u,v

↵v,k

,





i,u,v


i,u,v


i,u,v

such that v 2 F+i,u


P (D,Z,W,⇥) = P (D,W|⇥,Z) · P (Z|⇥) · P (⇥),


Y

i,u,k

Y

v2F

�i,u

(1� pkv

)

zu,k

·Y

i,u,k

Y

v2F

+i,u

�pkv

�wi,u,v·zu,k �

1� pkv



0) in the second


X

u

X

k

�u,k

0

B@log ⇡k

+

X

i

X

v2F

�i,u

log(1� pkv

)

+

X

i

X

v2F

+i,u

⌘i,u,v,k

log pkv

+ (1� ⌘i,u,v,k

) log(1� pkk

)

1

CA

where ⌘i,u,v,k


⌘i,u,v,k

= P (wi,u,v

= 1|u, i, zu,k

= 1,⇥0)

=

pkv

1�Q

w2F

+i,u

(1� pkw

)

.



v

yields

pkv

=

Phu,ii

v2F+i,u

�u,k

· ⌘i,u,v,k

S+v,k

+ S�v,k

, (4)

with S+v,k

=

Phu,ii

v2F+i,u

�u,k

and S�v,k

=

Phu,ii

v2F�i,u

�u,k

.




u

|tv

,↵v,u


v,u


u

|tv

,↵v,u




u

|tv

,↵v,u





u

|tv

,↵v,k

),where ↵

v,k


v.u

.The parameter ↵

v,k


v,k



P (u|⇥k

) =

Y

i:u 62Ci

Y

v2Ci

S(T |tv

(i),↵v,k

)·

Y

i:u2Ci

Y

v2Ci,tu(i)

S(tu

(i)|tv

(i),↵v,k

)

X

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

(5)


i,u,v


P (D,W|Z,⇥) =

Y

hu,ii62D

Y

k

Y

v2Ci

S(T |tv

(i),↵v,k

)

zu,k

·Y

hu,ii2D

Y

k

Y

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

wi,u,vzu,k

· S(tu

(i)|tv

(i),↵v,k

)

zu,k


u

|tv

,↵v,k

) = ↵v,k

exp {�↵v,k

�

v,u

}, which enablesS(t

u

|tv

,↵v,k

) = exp {�↵v,k

�

v,u

} and H(tu

|tv

,↵v,k

) =

↵v,k

. 1 Then,

Q(⇥;⇥

0) /

X

u,k

�u,k

log ⇡k

�X

hu,ii62D

X

k

X

v2Ci

�u,k

�

v

↵v,k

+

X

hu,ii2D

X

k

X

v2Ci,tu(i)

⌘i,u,v,k

�u,k

log↵v,k

�X

hu,ii2D

X

k

X

v2Ci,tu(i)

�u,k

�

u,v

↵v,k

,





i,u,v


i,u,v


i,u,v

such that v 2 F+i,u


P (D,Z,W,⇥) = P (D,W|⇥,Z) · P (Z|⇥) · P (⇥),


Y

i,u,k

Y

v2F

�i,u

(1� pkv

)

zu,k

·Y

i,u,k

Y

v2F

+i,u

�pkv

�wi,u,v·zu,k �

1� pkv



0) in the second


X

u

X

k

�u,k

0

B@log ⇡k

+

X

i

X

v2F

�i,u

log(1� pkv

)

+

X

i

X

v2F

+i,u

⌘i,u,v,k

log pkv

+ (1� ⌘i,u,v,k

) log(1� pkk

)

1

CA

where ⌘i,u,v,k


⌘i,u,v,k

= P (wi,u,v

= 1|u, i, zu,k

= 1,⇥0)

=

pkv

1�Q

w2F

+i,u

(1� pkw

)

.



v

yields

pkv

=

Phu,ii

v2F+i,u

�u,k

· ⌘i,u,v,k

S+v,k

+ S�v,k

, (4)

with S+v,k

=

Phu,ii

v2F+i,u

�u,k

and S�v,k

=

Phu,ii

v2F�i,u

�u,k

.




u

|tv

,↵v,u


v,u


u

|tv

,↵v,u




u

|tv

,↵v,u





u

|tv

,↵v,k

),where ↵

v,k


v.u

.The parameter ↵

v,k


v,k



P (u|⇥k

) =

Y

i:u 62Ci

Y

v2Ci

S(T |tv

(i),↵v,k

)·

Y

i:u2Ci

Y

v2Ci,tu(i)

S(tu

(i)|tv

(i),↵v,k

)

X

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

(5)


i,u,v


P (D,W|Z,⇥) =

Y

hu,ii62D

Y

k

Y

v2Ci

S(T |tv

(i),↵v,k

)

zu,k

·Y

hu,ii2D

Y

k

Y

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

wi,u,vzu,k

· S(tu

(i)|tv

(i),↵v,k

)

zu,k


u

|tv

,↵v,k

) = ↵v,k

exp {�↵v,k

�

v,u

}, which enablesS(t

u

|tv

,↵v,k

) = exp {�↵v,k

�

v,u

} and H(tu

|tv

,↵v,k

) =

↵v,k

. 1 Then,

Q(⇥;⇥

0) /

X

u,k

�u,k

log ⇡k

�X

hu,ii62D

X

k

X

v2Ci

�u,k

�

v

↵v,k

+

X

hu,ii2D

X

k

X

v2Ci,tu(i)

⌘i,u,v,k

�u,k

log↵v,k

�X

hu,ii2D

X

k

X

v2Ci,tu(i)

�u,k

�

u,v

↵v,k

,


The,likelihood,of,an,ac:va:on,can,be,formulated,,by,applying,survival+analysis:,




i,u,v


i,u,v


i,u,v

such that v 2 F+i,u


P (D,Z,W,⇥) = P (D,W|⇥,Z) · P (Z|⇥) · P (⇥),


Y

i,u,k

Y

v2F

�i,u

(1� pkv

)

zu,k

·Y

i,u,k

Y

v2F

+i,u

�pkv

�wi,u,v·zu,k �

1� pkv



0) in the second


X

u

X

k

�u,k

0

B@log ⇡k

+

X

i

X

v2F

�i,u

log(1� pkv

)

+

X

i

X

v2F

+i,u

⌘i,u,v,k

log pkv

+ (1� ⌘i,u,v,k

) log(1� pkk

)

1

CA

where ⌘i,u,v,k


⌘i,u,v,k

= P (wi,u,v

= 1|u, i, zu,k

= 1,⇥0)

=

pkv

1�Q

w2F

+i,u

(1� pkw

)

.



v

yields

pkv

=

Phu,ii

v2F+i,u

�u,k

· ⌘i,u,v,k

S+v,k

+ S�v,k

, (4)

with S+v,k

=

Phu,ii

v2F+i,u

�u,k

and S�v,k

=

Phu,ii

v2F�i,u

�u,k

.




u

|tv

,↵v,u


v,u


u

|tv

,↵v,u




u

|tv

,↵v,u





u

|tv

,↵v,k

),where ↵

v,k


v.u

.The parameter ↵

v,k


v,k



P (u|⇥k

) =

Y

i:u 62Ci

Y

v2Ci

S(T |tv

(i),↵v,k

)·

Y

i:u2Ci

Y

v2Ci,tu(i)

S(tu

(i)|tv

(i),↵v,k

)

X

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

(5)


i,u,v


P (D,W|Z,⇥) =

Y

hu,ii62D

Y

k

Y

v2Ci

S(T |tv

(i),↵v,k

)

zu,k

·Y

hu,ii2D

Y

k

Y

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

wi,u,vzu,k

· S(tu

(i)|tv

(i),↵v,k

)

zu,k


u

|tv

,↵v,k

) = ↵v,k

exp {�↵v,k

�

v,u

}, which enablesS(t

u

|tv

,↵v,k

) = exp {�↵v,k

�

v,u

} and H(tu

|tv

,↵v,k

) =

↵v,k

. 1 Then,

Q(⇥;⇥

0) /

X

u,k

�u,k

log ⇡k

�X

hu,ii62D

X

k

X

v2Ci

�u,k

�

v

↵v,k

+

X

hu,ii2D

X

k

X

v2Ci,tu(i)

⌘i,u,v,k

�u,k

log↵v,k

�X

hu,ii2D

X

k

X

v2Ci,tu(i)

�u,k

�

u,v

↵v,k

,





i,u,v


i,u,v


i,u,v

such that v 2 F+i,u


P (D,Z,W,⇥) = P (D,W|⇥,Z) · P (Z|⇥) · P (⇥),


Y

i,u,k

Y

v2F

�i,u

(1� pkv

)

zu,k

·Y

i,u,k

Y

v2F

+i,u

�pkv

�wi,u,v·zu,k �

1� pkv



0) in the second


X

u

X

k

�u,k

0

B@log ⇡k

+

X

i

X

v2F

�i,u

log(1� pkv

)

+

X

i

X

v2F

+i,u

⌘i,u,v,k

log pkv

+ (1� ⌘i,u,v,k

) log(1� pkk

)

1

CA

where ⌘i,u,v,k


⌘i,u,v,k

= P (wi,u,v

= 1|u, i, zu,k

= 1,⇥0)

=

pkv

1�Q

w2F

+i,u

(1� pkw

)

.



v

yields

pkv

=

Phu,ii

v2F+i,u

�u,k

· ⌘i,u,v,k

S+v,k

+ S�v,k

, (4)

with S+v,k

=

Phu,ii

v2F+i,u

�u,k

and S�v,k

=

Phu,ii

v2F�i,u

�u,k

.




u

|tv

,↵v,u


v,u


u

|tv

,↵v,u




u

|tv

,↵v,u





u

|tv

,↵v,k

),where ↵

v,k


v.u

.The parameter ↵

v,k


v,k



P (u|⇥k

) =

Y

i:u 62Ci

Y

v2Ci

S(T |tv

(i),↵v,k

)·

Y

i:u2Ci

Y

v2Ci,tu(i)

S(tu

(i)|tv

(i),↵v,k

)

X

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

(5)


i,u,v


P (D,W|Z,⇥) =

Y

hu,ii62D

Y

k

Y

v2Ci

S(T |tv

(i),↵v,k

)

zu,k

·Y

hu,ii2D

Y

k

Y

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

wi,u,vzu,k

· S(tu

(i)|tv

(i),↵v,k

)

zu,k


u

|tv

,↵v,k

) = ↵v,k

exp {�↵v,k

�

v,u

}, which enablesS(t

u

|tv

,↵v,k

) = exp {�↵v,k

�

v,u

} and H(tu

|tv

,↵v,k

) =

↵v,k

. 1 Then,

Q(⇥;⇥

0) /

X

u,k

�u,k

log ⇡k

�X

hu,ii62D

X

k

X

v2Ci

�u,k

�

v

↵v,k

+

X

hu,ii2D

X

k

X

v2Ci,tu(i)

⌘i,u,v,k

�u,k

log↵v,k

�X

hu,ii2D

X

k

X

v2Ci,tu(i)

�u,k

�

u,v

↵v,k

,





i,u,v


i,u,v


i,u,v

such that v 2 F+i,u


P (D,Z,W,⇥) = P (D,W|⇥,Z) · P (Z|⇥) · P (⇥),


Y

i,u,k

Y

v2F

�i,u

(1� pkv

)

zu,k

·Y

i,u,k

Y

v2F

+i,u

�pkv

�wi,u,v·zu,k �

1� pkv



0) in the second


X

u

X

k

�u,k

0

B@log ⇡k

+

X

i

X

v2F

�i,u

log(1� pkv

)

+

X

i

X

v2F

+i,u

⌘i,u,v,k

log pkv

+ (1� ⌘i,u,v,k

) log(1� pkk

)

1

CA

where ⌘i,u,v,k


⌘i,u,v,k

= P (wi,u,v

= 1|u, i, zu,k

= 1,⇥0)

=

pkv

1�Q

w2F

+i,u

(1� pkw

)

.



v

yields

pkv

=

Phu,ii

v2F+i,u

�u,k

· ⌘i,u,v,k

S+v,k

+ S�v,k

, (4)

with S+v,k

=

Phu,ii

v2F+i,u

�u,k

and S�v,k

=

Phu,ii

v2F�i,u

�u,k

.




u

|tv

,↵v,u


v,u


u

|tv

,↵v,u




u

|tv

,↵v,u





u

|tv

,↵v,k

),where ↵

v,k


v.u

.The parameter ↵

v,k


v,k



P (u|⇥k

) =

Y

i:u 62Ci

Y

v2Ci

S(T |tv

(i),↵v,k

)·

Y

i:u2Ci

Y

v2Ci,tu(i)

S(tu

(i)|tv

(i),↵v,k

)

X

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

(5)


i,u,v


P (D,W|Z,⇥) =

Y

hu,ii62D

Y

k

Y

v2Ci

S(T |tv

(i),↵v,k

)

zu,k

·Y

hu,ii2D

Y

k

Y

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

wi,u,vzu,k

· S(tu

(i)|tv

(i),↵v,k

)

zu,k


u

|tv

,↵v,k

) = ↵v,k

exp {�↵v,k

�

v,u

}, which enablesS(t

u

|tv

,↵v,k

) = exp {�↵v,k

�

v,u

} and H(tu

|tv

,↵v,k

) =

↵v,k

. 1 Then,

Q(⇥;⇥

0) /

X

u,k

�u,k

log ⇡k

�X

hu,ii62D

X

k

X

v2Ci

�u,k

�

v

↵v,k

+

X

hu,ii2D

X

k

X

v2Ci,tu(i)

⌘i,u,v,k

�u,k

log↵v,k

�X

hu,ii2D

X

k

X

v2Ci,tu(i)

�u,k

�

u,v

↵v,k

,


By,adop:ng,the,exponen.al+distribu.on,as,density,for,the,condi:onal,transmission,likelihood,and,by,introducing,hidden,variables,for,modeling,the,iden:ty,of,the,influencer,,we,obtain:,




i,u,v


i,u,v


i,u,v

such that v 2 F+i,u


P (D,Z,W,⇥) = P (D,W|⇥,Z) · P (Z|⇥) · P (⇥),


Y

i,u,k

Y

v2F

�i,u

(1� pkv

)

zu,k

·Y

i,u,k

Y

v2F

+i,u

�pkv

�wi,u,v·zu,k �

1� pkv



0) in the second


X

u

X

k

�u,k

0

B@log ⇡k

+

X

i

X

v2F

�i,u

log(1� pkv

)

+

X

i

X

v2F

+i,u

⌘i,u,v,k

log pkv

+ (1� ⌘i,u,v,k

) log(1� pkk

)

1

CA

where ⌘i,u,v,k


⌘i,u,v,k

= P (wi,u,v

= 1|u, i, zu,k

= 1,⇥0)

=

pkv

1�Q

w2F

+i,u

(1� pkw

)

.



v

yields

pkv

=

Phu,ii

v2F+i,u

�u,k

· ⌘i,u,v,k

S+v,k

+ S�v,k

, (4)

with S+v,k

=

Phu,ii

v2F+i,u

�u,k

and S�v,k

=

Phu,ii

v2F�i,u

�u,k

.




u

|tv

,↵v,u


v,u


u

|tv

,↵v,u




u

|tv

,↵v,u





u

|tv

,↵v,k

),where ↵

v,k


v.u

.The parameter ↵

v,k


v,k



P (u|⇥k

) =

Y

i:u 62Ci

Y

v2Ci

S(T |tv

(i),↵v,k

)·

Y

i:u2Ci

Y

v2Ci,tu(i)

S(tu

(i)|tv

(i),↵v,k

)

X

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

(5)


i,u,v


P (D,W|Z,⇥) =

Y

hu,ii62D

Y

k

Y

v2Ci

S(T |tv

(i),↵v,k

)

zu,k

·Y

hu,ii2D

Y

k

Y

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

wi,u,vzu,k

· S(tu

(i)|tv

(i),↵v,k

)

zu,k


u

|tv

,↵v,k

) = ↵v,k

exp {�↵v,k

�

v,u

}, which enablesS(t

u

|tv

,↵v,k

) = exp {�↵v,k

�

v,u

} and H(tu

|tv

,↵v,k

) =

↵v,k

. 1 Then,

Q(⇥;⇥

0) /

X

u,k

�u,k

log ⇡k

�X

hu,ii62D

X

k

X

v2Ci

�u,k

�

v

↵v,k

+

X

hu,ii2D

X

k

X

v2Ci,tu(i)

⌘i,u,v,k

�u,k

log↵v,k

�X

hu,ii2D

X

k

X

v2Ci,tu(i)

�u,k

�

u,v

↵v,k

,

1Similar formulations can be obtained by adopting different densities andare omitted here for lack of space.Modeling,the,probability,that,a,user,survives,

uninfected,at,least,un:l,:me,tu,,Learning influence weights. The analytical optimization ofQ(⇥;⇥



i,u,v


i,u,v


i,u,v

such that v 2 F+i,u


P (D,Z,W,⇥) = P (D,W|⇥,Z) · P (Z|⇥) · P (⇥),


Y

i,u,k

Y

v2F

�i,u

(1� pkv

)

zu,k

·Y

i,u,k

Y

v2F

+i,u

�pkv

�wi,u,v·zu,k �

1� pkv



0) in the second


X

u

X

k

�u,k

0

B@log ⇡k

+

X

i

X

v2F

�i,u

log(1� pkv

)

+

X

i

X

v2F

+i,u

⌘i,u,v,k

log pkv

+ (1� ⌘i,u,v,k

) log(1� pkk

)

1

CA

where ⌘i,u,v,k


⌘i,u,v,k

= P (wi,u,v

= 1|u, i, zu,k

= 1,⇥0)

=

pkv

1�Q

w2F

+i,u

(1� pkw

)

.



v

yields

pkv

=

Phu,ii

v2F+i,u

�u,k

· ⌘i,u,v,k

S+v,k

+ S�v,k

, (4)

with S+v,k

=

Phu,ii

v2F+i,u

�u,k

and S�v,k

=

Phu,ii

v2F�i,u

�u,k

.




u

|tv

,↵v,u


v,u


u

|tv

,↵v,u




u

|tv

,↵v,u





u

|tv

,↵v,k

),where ↵

v,k


v.u

.The parameter ↵

v,k


v,k



P (u|⇥k

) =

Y

i:u 62Ci

Y

v2Ci

S(T |tv

(i),↵v,k

)·

Y

i:u2Ci

Y

v2Ci,tu(i)

S(tu

(i)|tv

(i),↵v,k

)

X

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

(5)


i,u,v


P (D,W|Z,⇥) =

Y

hu,ii62D

Y

k

Y

v2Ci

S(T |tv

(i),↵v,k

)

zu,k

·Y

hu,ii2D

Y

k

Y

v2Ci,tu(i)

H(tu

(i)|tv

(i),↵v,k

)

wi,u,vzu,k

· S(tu

(i)|tv

(i),↵v,k

)

zu,k


u

|tv

,↵v,k

) = ↵v,k

exp {�↵v,k

�

v,u

}, which enablesS(t

u

|tv

,↵v,k

) = exp {�↵v,k

�

v,u

} and H(tu

|tv

,↵v,k

) =

↵v,k

. 1 Then,

Q(⇥;⇥

0) /

X

u,k

�u,k

log ⇡k

�X

hu,ii62D

X

k

X

v2Ci

�u,k

�

v

↵v,k

+

X

hu,ii2D

X

k

X

v2Ci,tu(i)

⌘i,u,v,k

�u,k

log↵v,k

�X

hu,ii2D

X

k

X

v2Ci,tu(i)

�u,k

�

u,v

↵v,k

,


Modeling,instantaneous,infec:ons,

Evalua:on,on,Synthe:c,Data.,

(a) µ = 0.001 (b) µ = 0.01

(c) µ = 0.05 (d) µ = 0.1

Fig. 1: Synthetic Networks.

standard measure (e.g., conductance, cut ratio, modularity,etc.) using the network.Synthetic Datasets. We generate synthesized data in twosteps. First, we generate a network which exhibits a knowncommunity structure, as well as structural features typical ofreal networks. To this aim, we use the generator of benchmarkgraphs described in [?], which generates directed unweightedgraphs with possibly overlapping communities. The process ofnetwork generation is controlled by the following parameters:(i) number of nodes (1, 000); (ii) average in-degree (10); (iii)maximum in-degree (150); (iv) min/max the community sizes(50/750). The four networks differ on the percentage µ ofoverlapping memberships, ranging into 0.001, 0.01, 0.05 and0.1. This last parameter, as is clearly visible from the topologyof the generated networks reported in Fig. ??, strongly affectsthe structure of the network, which ranges from well-separated(but yet connected) components, to strongly overlapping.

Given a network G = (V,E), the next step is to gen-erate synthetic propagation cascades by simulating a propa-gation/contagion process which spreads over E. Again, weparameterize the propagation strategy, and study the behaviorof each algorithm by varying such parameter. The overall datageneration schema generates |I| propagation traces based onthe following protocol. Given a network G = (V,E) with aknown community structure, for each community k, an initialdummy node is connected to all nodes within the consideredcommunity, with a random influence weight sampled from[0.02, 0.05]. For each trace we generate a random permutationof the dummy nodes and, after selecting the first one, then � th community-node is picked randomly with probability�n. This initialization step determines the degree to which thetrace to be generated will be local/global. At time t = 0, thedummy nodes determine the activation of real nodes, fromwhich we start the subsequent diffusion process. At this stage,information can spread on the network by exploiting the links.

TABLE I: Statistics for the synthetic data: four networkscorresponding to four values of µ as in Figure ??.

S1 S2 S3 S4# of communities (K) 9 7 11 6avg # of adoptions 56k 59k 82k 370kavg trace length 38 38 54 256avg % of communitiestraversed by a trace

17% 24% 24% 82%

The strength of each link is determined by considering boththe outdegree (out

· ) of the source and the indegree (in

· ) ofthe destination:

weight(u, v) / � · out

u

out

in

v

in

+ (1� �) · rand(0.1, 1)

where out and in are the maximum out-degree and in-degreerespectively, and � introduces a random effect.

In the propagation process, the weight of each link repre-sents a bernullian probability of infection. For each link wealso generate a typical infection rate ↵

u,v

, sampled from aGamma distribution (shape=2, scale=0.3).

To summarize, the synthesized data depends on the degreeof community overlapping µ, the degree of propagation over-lap � and the size |I| of the propagation log. In this firstexperiment, we fix � = 0.9, � = 0.2 and |I| = 1, 500, andvary the µ parameter as discussed above. For each network,we randomly generate 5 propagation logs. The main propertiesof the synthetic generated data are summarized in Tab. ??.Baselines. The C-IC and C-Rate techniques are compared tosome baseline models. The first two baselines builds on theidea of network reconstruction. Given a log of past propagationD we can apply either NetRate or the Independent Cascadeinference procedure [?] (assuming the complete graph). Bothalgorithms provide a set of link weights as output, andhigher weights witness the existence of strong connections.We reconstruct the network by applying a sparisificationprocedure based on the identification of a threshold value,accomplished by analyzing the distribution of the weights.Finally, communities are discovered by applying the Metisalgorithm [?], which is reported to achieve good performancesand is fast. These baselines are denoted as NetRate/Metis andIC/Metis.

A further baseline is a standard clustering algorithm thatgroups users according to their common-adopted items. Thealgorithm is based on a multinomial expectation maximization

procedure. It should be note however, that the clusteringalgorithm does not provide a method for measuring the degreeof influence of a user within a community, like instead thealgorithms proposed in this paper do.Results. We measure the quality of the discovered commu-nities w.r.t. the known ground truth communities using thenormalized � index, the Adjusted Rand Index [?], as wellas the F-Measure and the Normalized Mutual Information[?]. For all the considered approaches, we report the averagequality indices, as well as standard deviation relative to the5 propagation logs, in Fig. ??. As we can see on the figure,both C-IC and C-Rate perform particularly well on all four

•  Number,of,nodes,=,1000;,•  Average,in'degree,=,10;,•  Maximum,in'degree,=,150;,,•  Min/max,of,the,community,sizes,=,50/750.,,The,four,networks,differ,on,the,percentage,μ,of,overlapping,memberships.,

We,use,a,generator,of,benchmark,graphs[1],,which,generates,directed+unweighted+graphs,with,possibly+

overlapping+communi.es.,

[1],A.,Lancichineh,and,S.,Fortunato.,Benchmarks,for,tes:ng,community,detec:on,algorithms,on,directed,and,weighted,graphs,with,overlapping,communi:es.,Physical,Review,E,,80,,2009.,

•  Propaga:on,cascades,are,generated,according,to,the,Net'Rate,propaga:on,model.,

•  The,transmission,rate,for,each,link,is,sampled,from,a,Gamma,distribu:on,(shape=2,,scale=0.3).,

Results.,

•  Based,on,network,reconstruc:on,(assuming,a,dense,graph):,

•  Inference,for,the,IC,Model[2];,•  Net'Rate[3];,•  Communi:es,are,detected,by,

applying,METIS[4],on,the,reconstructed,graph.,

•  Mul:nomial,EM,

Baseline+Models+

[2],K.,Saito,,R.,Nakano,,and,M.,Kimura,,Predic:on,of,informa:on,diffusion,probabili:es,for,independent,cascade,model.,KES’08.,[3],M.,Gomez'Rodriguez,,D.,Balduzzi,,B.,Schölkopf.,Uncovering,the,Temporal,Dynamics,of,Diffusion,Networks.,ICML,2011.,[4],G.,Karypis,and,V.,Kumar,,A,fast,and,high,quality,mul:level,scheme,for,par::oning,irregular,graph.;,SIAM,Journal,on,Scien:fic,Compu:ng,,vol.,20,,no.,1,,pp.,359–392,,1999.,

Evalua:on,on,real,data.,

contains traces of 8, 541 URLs for a total number of 516, 412adoptions (tweets). The average number of users per trace is60, and in average a user performs 18 tweets.

Relying on a dataset which discloses the underlying networkallows us measure the correspondence between the discoveredstructure and the latent true structure. Notice however that,since we do not have any form of ground truth for the commu-nity structure exhibited by the network under analysis, we haveto rely on empirical objective functions to assess the quality ofthe retrieved communities. Intuitively, for a given partition ofthe network in communities, such measures should promotecommunities which have a higher internal connectivity thanthe external one. In the following, we consider 3 differentscores for measuring the quality of each community of apartition (see [?] for a detailed discussion):

• Conductance, the simplest formalization of the conceptabove, as it measures the ratio between the number ofedges inside the communities and the number of edgestraversing its border.

• Internal Density, the ratio between the actual edges andthe possible edges within the community.

• Cut Ratio, the ratio between the number of edges on theboundary of a cluster and all the possible ones.

Notice that the above measures evaluate the connectivity ofa community without considering the direction of the edges.We thus adopt also the directed version of modularity measure[?]. Modularity compares the structure of the graph to thatresulting from a random graph, representing a null model, andis defined as

QG

=

1

m

X

u,v

[Au,v

� E(u, v)] �cu,cv

In the above equation, Au,v

is the cell of the adjacency matrixcorresponding to the pair (u, v), and E(u, v) = out

u

in

v

/mrepresents the expected likelihood of observing the link (u, v)in the null (random graph) model. Also, �

cu,cv is the kroneckerdelta relative to community memberships for nodes u and v.

Higher modularity, and internal density, as well as lowerconductance and cut ratio, denote good partitioning. Thefollowing table reports the values of these measures, as well asthe final number of communities discovered by the algorithmson the Twitter dataset. The values of conductance, internaldensity and cut ratio were averaged over all the discoveredcommunities.

A first symptom of the quality of the partitioning is theinternal density which is an order of magnitude higher thanthe density of the whole graph (0.0041). Also, a large numberof edges tend to stay within the community, as witnessed bythe conductance value. Finally, values of modularity around0.2-0.3 are usually considered good indication of a communitystructure. The existence of the a good community structure isconfirmed by the diagonal block structure in the adjacencymatrices in Figure ??. These are adjacency matrices wherethe users are sorted and grouped in blocks representing thecommunities.

TABLE II: Summary of the evaluation on real data.

C-IC C-RateCommunities 20 64

Community size 156/3651/1319 97/1758/328(min/max/median)

QG

0.3274 0.2424Conductance 0, 5849 0, 6791

Internal Density 0, 031 0, 051Cut Ratio 0, 001 0.0009

Time (mins) 105 122

Fig. 4: Adjacency matrices for C-IC and C-Rate.

Figure ?? shows the parameters pkv

produced by C-IC. Theplot shows that there is a general tendency to express influencewithin the same community of membership: that is, a user ubelonging to community k is likely to influence users withink, and is unlikely to influence users in other communities.Interestingly, some communities in C-IC tend to be correlated,as influence values are high on certain blocks in addition to thediagonal. We do not report the same plot for C-Rate, due tovisibility problem given by the larger number of communities.

Finally, Fig. ?? shows the distribution of the communitiestouched by a single trace. Values tend to concentrate on fewcommunities. The plots also show the distribution of the nor-malized entropy of a single trace, computed by considering the

•  Number,of,nodes,=,28,185;,•  Number,of,links,=,1,636,4511;,•  Number,of,propaga:ons,(urls),=,8,541;,,•  Tweets,=,516,412.,

TwiEer+data+

Internal,density,is,an,order,of,magnitude,higher,than,the,density,of,the,whole,graph,(0.0041).,

Modularity,and,the,diagonal,block,structure,of,the,incidence,matrix,,confirm,the,existence,of,a,good,community,structure.,

THANKS!

influence-based network-oblivious - icdm 2013

Education