inside the atoms: mining a network of networks and beyond by hanghang tong at bigmine16

Arizona State University

InsidetheAtoms:MiningaNetworkofNetworksandBeyond

Hanghang Tong [email protected]

http://tonghanghang.org

- 1 -

@KDD BigMine 16: the 5th International Workshop on Big Data, Streams and Heterogeneous Source Mining


Hospital Networks

US Power Grid

Biological Networks

Collaboration Networks

Observation: Graphs are everywhere!

- 2 -

Traffic Network

Brain Networks


Graph Mining: An Overview

- 3 -

Observation: Mining stops at nodes/links (atom) level. Q: Is there a level x (x=4, 5, …)? What is it?

graph

subgraph

node/link


A Motivating Example: Cross-Network Association (e.g., candidate gene prioritization problem)

- 4 -

§  Problem Definition –  Given: (1) two networks P and G,

and (2) their partial association A;

–  Find: missing associations in A.

§  Solutions: Graph Ranking –  Given: a green node (disease); –  Find: the most relevant blue nodes (genes).

P G A

A Powerful Primitive in (A1) drug discovery; (A2) social recommendation; (3) QA post-tagging, etc.

(PPI)

(Phenotype)



- 5 -

§  Problem Definition –  Given: (1) two networks P and G,

and (2) their partial association A;

–  Find: missing associations in A.

§  Solutions: Graph Ranking –  Given: a green node (disease); –  Find: the most relevant blue nodes (genes).

§  Limitations: Each green node (disease) might have its own PPI network!

O. Magger, Y. Y. Waldman, E. Ruppin, and R. Sharan. Enhancing the prioritization of disease-causing genes through tissue specific protein interaction networks. PLoS Computational Biology, 8(9), 2012.

P G A



- 6 -

•  ADiseaseNetworkP•  APPINetworkG

a

b

c

d

G A

4 5

3

6 7

2 1 P

•  ADiseaseNetworkP•  A set of :ssue-specific PPINetworksG1,…,G7

4 5

3

6 7

2 1 P

A G1 a b

d c

G2 a

c d b

G7 a b

d c

… … …

…


A Set of Networks: More Applications

- 7 -

Collaborations

System of Systems

Brain Networks

Cyber-Physics Systems


Roadmap

§ Motivations § NoN: A Network of Networks

– NoN Modeling

– NoN Mining

§ Beyond NoN § Some of Our Other Recent Work

- 8 -


Modeling NoN

§  Q: How to represent a set of inter-connected networks (e.g., Tissue-Specific PPI Networks)?

- 9 -

4 5

3

6 7

2 1 P

A G1 a b

d c

G2 a

c d b

G7 a b

d c

… …

… …


Introducing the NoN Model

§  A: each green node (disease) itself is a network

- 10 -

NoN (A Network of Networks) := a triplet R = <G, A, θ> •  G: Main Network (the green, disease to disease networks) •  A: Domain Networks (the blue, tissue-specific PPI networks) •  θ: Mapping function (each green, main node à a blue, domain network)

J. Ni, H. Tong, W. Fan, X. Zhang: Inside the atoms: ranking on a network of networks. KDD 2014


NoN Models: Examples

Applications The Main Network (G) Domain Networks (A) Gene-Pheno Assoc. Disease Sim Network Tissue-specific PPI Nets LBSN Geo-proximity network Social Networks Brain Initiative Person-Person Network Brain Networks Team of Teams Project Dependence Net Team Networks Scholarly Data Res. Area Sim Network Collaboration Networks

- 11 -

NoN (A Network of Networks) := a triplet R = <G, A, θ> •  G: Main Network (the green, disease to disease networks) •  A: Domain Networks (the blue, tissue-specific PPI networks) •  θ: Mapping function (each green, main node à a blue, domain network)


NoN - Generalizations

§ G1: Multi-layered NoN – Candidate Gene Prioritization: Disease-tissue-

protein

– Geo-social networks: City-district-person

§ G2: Soft Mapping function θ –  1-to-many, or many-to-many

- 12 - •  C. Chen, J. He, N. Bliss and H. Tong: “On the Connectivity of Multi-layered Networks: Models, Measures and

Optimal Control” ICDM 2015.


NoN vs. Some Popular Multi-Network Models

§  They are all special case of our NoN model! – Tensor: a special NoN with

1)  A full clique main network (G);

2)  All domain networks (A) sharing the same node sets

– Hypergraph: a special NoN with 1)  All domain networks (A) being empty

– Multiplex: a special NoN with 1)  Two-layers

2)  All domain networks (A) sharing the same node sets

- 13 -


Roadmap


– NoN Modeling

– NoN Mining: Ranking and Clustering


- 14 -


NoN Mining - Ranking A1: Given a disease (e.g. P1), what are the most relevant genes (blue nodes)?

- 15 -

A2: Who is most influential, considering both the within- and cross-area influence?


Ranking on a Single Network

- 16 -

Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12

0.13 0.10 0.13 0.22 0.13 0.05 0.05 0.08 0.04 0.03 0.04 0.02

1

4

3

2

5 6

7

9 10

8 11

12 0.13

0.10

0.13

0.13

0.05

0.05

0.08

0.04

0.02

0.04

0.03

Ranking vector More red, more relevant Nearby nodes, higher scores

Background

4rr

H. Tong, C. Faloutsos, J.-Y. Pan: Fast Random Walk with Restart and Its Applications. ICDM 2006. (best paper award at 2006, ICDM 2015 10-Yeart Highest Impact Paper Award)



- 17 -

Node 4 Node 1 Node 2 Node 3 Node 4 Node 5 Node 6 Node 7 Node 8 Node 9 Node 10 Node 11 Node 12

0.13 0.10 0.13 0.22 0.13 0.05 0.05 0.08 0.04 0.03 0.04 0.02

1

4

3

2

5 6

7

9 10

8 11

12 0.13

0.10

0.13

0.13

0.05

0.05

0.08

0.04

0.02

0.04

0.03

Ranking vector More red, more relevant Nearby nodes, higher scores

4rr

Background

Footnote: “Maxwell Equation” for Web [Soumen Chakrabarti]

ri = c x A x ri + (1-c) x ei



- 18 -

Background

An Optimization Viewpoint of “Maxwell Equation” for Web (Symmetric A)

ri = c x A x ri + (1-c) x ei

= argmin cri'(I – A)ri + (1-c) x||ri – ei||2

Network Smoothness Query Preference


Ranking on NoN § Optimization Formulation:

§  Intuition: –  Similar ranking scores for an overlapped node, if their

G(i,j) is high.

–  A set of correlated g random walks

- 19 - J. Ni, H. Tong, W. Fan, X. Zhang: Inside the atoms: ranking on a network of networks. KDD 2014

#1: within-network smoothness #2: query preference #2: query preference

#3: cross-network consistency


Ranking on NoN § Optimization Formulation:

§ Equivalence: J(r) = J(r1,…,rg)

–  Intuition: a single R.W. on the integrated graph A

–  Property: J(r) is positive-definite!

- 20 -

~

#1: within-network smoothness #2: query preference

#3: cross-network consistency


Ranking on NoN

§ Equivalence: J(r) = J(r1,…,rg)

–  Intuition: One single random walk on the integrated graph A

–  Property: J(r) is positive-definite!

§ Algorithms: –  #1: A linear algorithm à the optimal solution

–  #2: Any existing fast solution on a single network

–  #3: Further Speedup: O(T(m+ng)) à O(T(g log(g) + z)) •  g << n; and z << m (key idea: using main network to do pruning)

- 21 -

~


NoN Ranking - Results

- 22 -

A1: Candidate Gene Prioritization •  Which genes are most relevant wrt

disease a?

ROC Curve Comparison

A2: Co-authorship Prediction •  Which DM authors are most likely to

collaborate with a given Med author?

AUC and Accuracy


NoN Mining - Clustering

§ Obj. Function:

- 23 - J. Ni, H. Tong, W. Fan, X. Zhang: Flexible and Robust Multi-Network Clustering. KDD 2015

Similar Intuition ! P-value vs. (biologically meaningful) clusters

§ Results:


Roadmap


– NoN Modeling

– NoN Mining

§ Beyond NoN: From NoN to NoX § Some of Our Other Recent Work

- 24 -


NoT: A Network of Time Series §  Problem Definition

- 25 - •  Y. Cai, H. Tong, W. Fan and P. Ji: Fast Mining of a Network of Coevolving Time Series. SDM 2015. •  Y. Cai, H. Tong, W. Fan, P. Ji, Q. He:Facets: Fast Comprehensive Mining of Coevolving High-order Time Series. KDD 2015

§  Models &

Algorithms

§  Results

0 50 100 150200

400

600

800

1000

1200

1400

frame #

coord

inate

original

DCMF

DMF

dynaMMo

DCMFdynaMMo

DMF

MARKER PLACEMENT GUIDE

The marker placement in this document is only one of many possible combinations. T his

guide will only show the standard marker placement that’s being used in the motion capture laboratory. The marker placement in this guide resembles the one that is shown and explained in the Vicon 512 manual. As such, the Vicon 512 Manual can offer

additional information. The difference with the marker set in this document from the Vicon 512 Manual is the addition of 4 m arkers, namely RARM, L ARM, RLEG, a nd LLEG.

Before starting, below are some general rules of thumb one should follow:• Have the person who’s going to be motion captured wear tight fitt ing clot hes—strap

down any areas of the clothing that is loose. The marker balls’ posit ion should move

as lit tle as poss ible and should be properly seen.• Place the marker balls as close to the bone as possible. T his follows t he rule of

having the marker balls stay stationary during movement.


iBall: A Network of Regression Models

- 26 - •  Y. Yao, H. Tong, F. Xu, J. Lu: Predicting long-term impact of CQA posts: a comprehensive viewpoint. KDD 2014 •  L. Li, H. Tong: The Child is Father of the Man: Foresee the Success at the Early Stage. KDD 2015. •  “Data Mining Reveals the Secret to Getting Good Answers”, MIT Technology Review, 2013

§  Results

§  Models & Algorithms §  Problem Definition

D1

D3

D2 D4


Fascinate: Cross-Layer Dependence Inference on Multi-Layered Networks

- 27 -

§ R

esul

ts

§  Methods §  Problem Definition

Infer Unobserved Cross-Layer Links Cross-Layer Inference = Collective CF

Effectiveness Efficiency •  C. Chen, J. He, N. Bliss and H. Tong: “On the Connectivity of Multi-layered Networks: Models, Measures and Optimal Control” ICDM15. •  C. Chen, H. Tong, L. Xie, L. Ying and Q. He: “FASCINATE: Fast Cross-Layer Dependence Inference on Multi-layered Networks”, KDD16, 3:15pm, Monday, Plaza Room A/B


Conclusion: a Network of X § Summary

– NoN: Network + Networks

– NoT: Network + Time Series

–  iBall: Network + Regression

– Fascinate: Network + Inference

§  Take Home Messages – Modeling: `No’ (i.e., a Network of X) as the answer

•  Networks as data à as context

– Algorithms: Networks as the contextual regularizer - 28 -


Roadmap



– Team Replacement

– TravelModeLogger

– BrainQuest

- 29 -

– Network Alignment

– Optimal Networks

– Visual Influence Sum


Replacing the Irreplaceable: Team Replacement Recommendation

- 30 -

•  L. Li, H. Tong, N. Cao, K. Ehrlich, Y.-R. Lin and N. Buchler: Replacing the Irreplaceable: Fast Algorithms for Team Member Recommendation, WWW 2015

•  N. Cao, Y.-R. Lin, L. Li, H. Tong: g-Miner: Interactive Visual Group Mining on Multivariate Graphs, ACM CHI 2015 •  System prototype & video demo: http://team-net-work.org

§  Problem Definition

§ S

yste

m

§  Sol.

§ R

esul

ts


Travel Mode Identification w/ Smartphones

- 31 -

§ P

rob.

Dfn

•  X. Su, H. Tong and P. Ji: Accelerometer-based Activity Recognition on Smartphone. CIKM 2014 •  X. Su, H. Caceres, H. Tong and Q. He: Travel Mode Identification with Smartphones. TRB 2015

§ M

etho

d

§ R

esul

ts

§  Open Challenges

²  Battery Consumption (sampling rates, sensor selection)

²  On-line algorithms ²  Adaptive (summer vs. winter;

high-way vs. local)


BrainQuest: Visual Brain Comparison

- 32 -

Quest brains to spot picture diff.

•  L. Shi, H. Tong, X. Mu: BrainQuest: Perception-Guided Visual Brain Comparison, ICDM 2015 •  L. Shi, H. Tong, M. Daianu, X. Mu and P. Thompson Block-wise Human Brain Network Visual Comparison Using NodeTrix Representation. VIS'16



- 33 -

Quest computers to spot brain diff.




- 34 -

Quest computers to spot brain diff.

AD group (n1) Control group (n2)




- 35 -

§ V

A F

ram

ewor

k §  Model & Algorithm

§ P

robl

em D

fn. §  Results

Spot structural diff. between two groups of brain networks



Query-Specific Optimal Networks

- 36 - L. Li, Y. Yao, J. Tang, W. Fan, H. Tong: QUINT: On Query-Specific Optimal Networks. KDD 2016. 10:00am, Monday, Plaza Room A/B

§  Goal: Optimal Networks –  Query-Specific

–  Optimal Topology + Weights –  On-line Learning

§  + Error Estimation

§  Results

§  Methods: VERY efficient way to estimate

Acc

urac

y (M

AP

) S

cala

bilit

y s x

ij

Query node

Positive node@Q(x, s)

@As(i, j)

Q(j, s)⇥Q(x, i)

/

Neighbor of Neighbor ofs x


Attributed Network Alignment

•  D. Koutra, H. Tong, D. Lubensky:BIG-ALIGN: Fast Bipartite Graph Alignment. ICDM 2013. •  S. Zhang and H. Tong: Final: Fast Attributed Network Alighnment. KDD 2016, 3:15pm, Monday, Plaza Room A/B

§ Fo

rmul

atio

n §  Algorithms

§ P

robl

em D

fn. §  Results

Accuracy vs. TimeAccuracy vs. Noise

•  Iterative Alg. •  Global Optimal •  Same Complexity as

ISORANK •  Further Speed-up

•  Low-Rank Approximation •  On-Query Alignment (Linear)


Vegas: Influence Graph Visual Summarization

- 38 - •  L. Shi, H. Tong, J. Tang and C. Lin: Flow-based Influence Graph Visual Summarization, ICDM 2014 •  L. Shi, H. Tong, J. Tang, C. Lin: VEGAS: Visual influEnce GrAph Summarization on Citation Networks. TKDE 2015

§ S

olut

ion

§ R

esul

ts

“Stochastic High-Level Petri Net and Applications”

§  Prob. Dfn.

Who/What How/Why


Q&A

Inside the atom is a whole new world!

- 39 -

•  “A whole new world •  Every turn a surprise •  With new horizons to pursue •  Every moment red-letter ……”


§  Collaborators: –  Norbou Buchler, Nan Cao, Madelaine Daianu, Kate Ehrlich, Wei

Fan, Qing He, Ping Ji, Yu-ru Lin, Lei Shi, Chuang Lin, Jie Tang, Paul M. Thompson, Lei Xie, Yuan Yao, Lei Ying, Xiang Zhang

§  Students: –  Liangyue Li –  Chen Chen

–  Yongjie Cai (now at Google) –  Xing Su

–  Si Zhang

Acknowledgement

- 40 -

inside the atoms: mining a network of networks and beyond by hanghang tong at bigmine16

Data & Analytics