recurrent subgraph prediction: presubsnagrech/papers/presubasonam2015_slides.pdf · saurabh...

Saurabh Nagrecha and Nitesh V. Chawla University of Notre Dame

Recurrent Subgraph Prediction"PReSub

Horst Bunke University of Bern

2!

Interactions in networks aren’t always dyadic.

Fig 1a: Simple group messages (hub-and-spoke).

Fig 1b: More complex hierarchies in businesses.

3!

Influence in social networks –  [Leskovec, J., et al. Advances in Knowledge Discovery

and Data Mining (2006)] Inferring attacks in anonymized social networks –  [Backstrom, L., et al. WWW (2007)]

Functional Discovery in biological networks –  [Hu, H., et al. Bioinformatics (2005)]

4!

Fig 2: Weekly snapshots of the Enron email corpus.

Source Node Destination Node t

432 23432 54

4254 437854 54

473743 32 55

93535 35443 55

tn-1 tn tn+1

5!Fig 3: Distribution of recurrent edges in the Enron network (across all timestamps).

Recurrent Edges

Freq

uenc

y of

Rec

urre

nce

6!

Say we have network snapshots as below: G1 = {l1, l2, l3} G2 = {l2, l3, l4} G3 = {l1, l2, l4} G4 = {l1, l2, l3}

We aren’t interested in the links that are static signals; instead we want to register the “blips”: G’1 = { } G’2 = { } G’3 = {l1} G’4 = {l3}

Network instance = bag of links = set of transactions

Most frequently recurring links = GetFrequentTransactions(allTransactions, minSupport) = GetFrequentTransactions([G’1, …, G’n], minSupport)

7!

There are predictable patterns in networks. Can we identify: What these patterns are When they occur Effective “early warning” methods to predict them

8!


Solution: Frequent Subgraph Mining

Recurrent Edges Fr

eque

ncy

of R

ecur

renc

e

9!


Solution: Subgraph Prediction

10!


Solution: Early Warning Subgraphs

11!

Predict individual links of the subgraph. If l subgraph, predict for occurrence of l subgraph, predict for non-occurrence of l. State-of-the-art link prediction methods employed using LPMade suite [Lichtenwalter, R. & Chawla, N. JMLR (2011)].

Fig 4: Exact graph matching.

∈ ∉

12!

We use GEDs to contextualize the subgraphs occurrence with respect to the global scenario in the network. Advantages of using GEDs: o  Flexible definition allows for weighted/

unweighted, directed/undirected graphs. o  Inexact matching allows for “unknown”

links in graph instances. o  Near-linear time approximate

implementation [Andoni, A. & Onak, K., SIAM Journal on Computing (2012)].

Fig 5: A pictorial summary of vector space embedding in GEDs.

13!

Networks often exhibit a telltale “build up” to the desired structure. Use these early warning subgraphs as features to predict target subgraph. Learn how these breadcrumbs lead to target subgraph for given data.

Fig 6: Early warning subgraphs as features to predict the target subgraph.

14!

Our method achieves high AUROC performance in predicting subgraphs, and outperforms link prediction on:

Commercial cellular phone calls Wikipedia Co-authorship Enron Email Corpus Facebook Wall Posts

Conclusion: We need to think of subgraphs as emergent structures in their own right and not just a composition of links.

Fig 7: AUROC Performance of our method v/s baseline link prediction.

15!

Army Research Laboratory (ARL)

U.S. Air Force Office of Scientific Research (AFOSR)

Defense Advanced Research Projects Agency (DARPA)

National Science Foundation (NSF)

Research was supported in part by the Army Research Laboratory under Cooperative Agreement Number W911NF- 09-2-0053, National Science Foundation (NSF) Grant OCI- 1029584, and by the grant FA9550-12-1-0405 from the U.S. Air Force Office of Scientific Research (AFOSR) and the Defense Advanced Research Projects Agency (DARPA).

16!

–  Leskovec, J., Singh, A. & Kleinberg, J. Patterns of influence in a recommendation network. In Advances in Knowledge Discovery and Data Mining, 380-389 (Springer, 2006).

–  Backstrom, L., Dwork, C. & Kleinberg, J. Wherefore art thou r3579x?: Anonymized social networks, hidden patterns, and structural steganography. In Proceedings of the 16th international conference on World Wide Web, 181-190 (ACM, 2007).

–  Hu, H., Yan, X., Huang, Y., Han, J. & Zhou, X. J. Mining coherent dense subgraphs across massive biological networks for functional discovery. Bioinformatics 21, i213-i221 (2005).

–  Lichtenwalter, R. N. & Chawla, N. V. LPMade: Link prediction made easy. Journal of Machine Learning Research 12, 2489-2492 (2011).

–  Andoni, A. & Onak, K. Approximating Edit Distance in Near-Linear Time. SIAM Journal on Computing 41, 1635-1648 (2012).

17!

Q & A

18!

Fig 8a (above): How prototypes are selected.

Fig 8b (right): How many prototypes are enough?

19!

Fig 9: If a very broad window is chosen, the fine-grained aspects of recurrence may be lost, and if a very narrow window is chosen, there may be very little difference between two consecutive snapshots.

20!

Dataset Number of Nodes Number of Edges Time Span

Mobile 8,321,119 712 million 65 days

Wiki 25,323,882 266 million ~4 years

Enron 87,098 1,147,028 ~4 years

Facebook 46,715 803,744 ~2 years

Table 1: A summary of the data sources used.

recurrent subgraph prediction: presubsnagrech/papers/presubasonam2015_slides.pdf · saurabh...

Documents