Transcript
Page 1: Influence at Scale: Distributed Computation of Complex

Influence at Scale: Distributed Computation of Complex Contagion in Networks

Brendan Lucier, Joel Oren, Yaron Singer Microsoft Research, University of Toronto, Harvard University

Motivation

β€’ Information requirements: how much of the graph do we need to examine? (query complexity).β€’ Efficient parallel computation: how to distribute computation; maintain high-volume intermediate data.

The FrameworkThe influence model: The Independent Cascade Model [KKT’03]β€’ Input: edge-weighted graph 𝐺 = (𝑉, 𝐸,p), initial seeds 𝑆0 βŠ† 𝑉.β€’ At every synchronous step 𝑑 > 0, every node previously infected node

𝑒 ∈ 𝑆𝑑 βˆ’ π‘†π‘‘βˆ’1 successfully infects an uninfected neighbor 𝑣 w.p. 𝑝𝑒𝑣.𝑆𝑑+1 -- set of infected nodes at step 𝑑 + 1.

β€’ The influence function: 𝑓 𝑆0 = 𝐸[𝑆𝑑]

The MRC parallel computation model [KSV’10]β€’ Inspired by the MapReduce paradigm.β€’ Synchronous round computation on 𝑁 π‘˜; 𝑣 on tuples. On every round:

1. Map: apply local transformation on tuples in a streaming fashion.2. Reduce: polytime computation on aggregates of tuples with same

key.β€’ Constraintaints: 𝑁1βˆ’πœ– machines and space per machine; log𝑐𝑁 rounds.

β€’ Graph access model: Link server.

The Information Requirements of Influence Estimation

β€’ Question: how much of the graph to we need to know in order to estimate the influence?

β€’ Concretely: what is the query complexity of approximating 𝑓(𝑆0)?

Theorem: Let πœ– ∈ 0,1

3, 𝛿 ∈ [0,

1

2]. For large enough 𝑛 = |𝑉|, βˆƒπΊ = (𝑉, 𝐸, 𝒑)

for which getting an estimate 𝑓(𝑆0) of 𝑓(𝑆0) s.t. 1 βˆ’ πœ– 𝑓 𝑆0 ≀ 𝑓 𝑆0 ≀ 𝑓(𝑆0) with confidence β‰₯ 1 βˆ’ 𝛿 requires Ξ©((1 βˆ’ 2𝛿) 𝑛 link server queries.

Main algorithm: Intermediate Layer; Estimating the Expectation

Main intuitions:1. High/low influence few/many samples needed,

2. Can compute the Riemann integral for CDF (𝑓 𝑆0 = 0

∞1 βˆ’ 𝐹(𝐼) 𝑑𝐼)

Lemma (informal):w.h.p., VerifyGuess returns True if 𝜏 > 𝑖𝑛𝑓𝑙𝑒𝑒𝑛𝑐𝑒.

Main Algorithm: Top Layer; Iterating on Candidate Values

Theorem: For any πœ– ∈ (0,1

4), InfEst provides a (1 + 8πœ–)-approx. w.h.p.

Algorithm: Bottom Layer; Bounded SamplesHighlights: Input: Seeds 𝑆0 βŠ† 𝑉, link-server oracle 𝑄(β‹…), 𝐿- # samples, bound 𝑑.1. Simultaneous prob-BFS in MapReduce, one layer per round.2. Finalize a sample when at least 𝑑 nodes infected.3. Return fraction of samples that reached bound.

Theorem (informal): 1) If 𝐿 is suff. small, # of tuples is Ξ©(π‘š1.5 log4 π‘š).2) If π‘‘π‘–π‘Žπ‘š 𝐺 = log𝑐 𝑛, algorithm takes polylog rounds.

Experimental Results

1

2

5

3

6

7

4

𝐺

𝑏

π‘Ž

π‘₯

𝑐 𝑑

𝑆0

𝑝π‘₯𝑏

𝑝π‘₯π‘Žπ‘π‘π‘‘π‘π‘π‘

π‘π‘‘π‘Ž

L-S𝑄(𝑒)

𝑁+(𝑒)

𝑝

Running timeApprox. ratios (Epinions)

Heuristics(Epinions)

Scalable estimation of the influence of individuals in massive social networks

Top Related