neighborhood formation and anomaly detection in bipartite graphs jimeng sun huiming qu deepayan...

25
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

Post on 19-Dec-2015

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

Neighborhood Formation and Anomaly Detection in Bipartite

Graphs

Jimeng Sun Huiming Qu

Deepayan Chakrabarti Christos Faloutsos

Speaker: Jimeng Sun

Page 2: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

2

Bipartite Graphs• G={V1 +V2, E} such that

edges are between V1 and V2

• Many applications can be modeled using bipartite graphs

• The key is to utilize these links across two natural groups for data mining

E

a1

ak

a5

a4

a3

a2

t1

tn

t5

t4

t3

t2

V1 V2

Page 3: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

3

Problem Definition• Neighborhood formation (NF)

• Given a query node a in V1, what are the relevance scores of all the nodes in V1 to a ?

• Anomaly detection (AD)

• Given a query node a in V1, what are the normality scores for nodes in V2 that link to a ?

V1 V2

a

.3

.2

.05

.01

.002

.01

.25

.25

.05

Page 4: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

4

Application I: Publication network

• Authors vs. papers in research communities

• Interesting queries:• Which authors are most related to Dr.

Carman?• Which is the most unusual paper written

by Dr. Carman?

Page 5: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

5

Application II: P2P network • Users vs. files in P2P systems• Interesting queries:

• Find the users with similar preferences to me• Locate files that are downloaded by users

with very different preferences

users

files

Page 6: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

6

Application III: Financial Trading• Traders vs. stocks in stock

markets• Interesting queries:

• Which are the most similar stocks to company A?

• Find most unusual traders (i.e., cross sectors)

Page 7: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

7

Application IV: Collaborative filtering• collaborative filtering • recommendation system Customers Products

Page 8: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

8

Outline• Problem Definition• Motivation• Neighborhood formation• Anomaly detection• Experiments• Related work• Conclusion and future work

Page 9: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

9

Outline• Problem Definition• Motivation• Neighborhood formation• Anomaly detection• Experiments• Related work• Conclusion and future work

Page 10: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

10

Neighborhood formation – intuition

Input: a graph G and a query node q

Output: relevance scores to q

• random-walk with restart from q in V1

• record the probability visiting each node in V1

• the nodes with higher probability are the neighbors

V1 V2

q

.3

.2

.05

.01.002

.01

Page 11: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

11

Exact neighborhood formation

Input: a graph G and a query node qOutput: relevance scores to q

• Construct the transition matrix P where • every node in the graph becomes a

state • every state has a restart probability c

to jump back to the query node q.• transition probability

• Find the steady-state probability u which is the relevance score of all the nodes to q

q

cc c

c

(1-c)

c

Page 12: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

12

Approximate neighborhood formation

• Scalability problem with exact neighborhood formation: • too expensive to do for every

single node in V1

• Observation: • Nodes that are far away from

q have almost 0 relevance scores.

• Idea:• Partition the graphs and apply

neighborhood formation for the partition containing q.

Page 13: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

13

Outline• Problem Definition• Motivation• Neighborhood formation• Anomaly detection• Experiments• Related work• Conclusion and future work

Page 14: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

14

Anomaly detection - intuition

• t in V2 is normal if all a in V1 that link to t belong to the same neighborhood

• e.g.

low normalityhigh normality

tt

Page 15: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

15

S

Anomaly detection - methodInput: a query node q from V2

Output: the normality score of q

• Find the set of nodes connected to q, say S

• Compute relevance scores of elements in S, denoted as rs

• Apply score function f(rs) to obtain normality scores:• e.g. f(rs) = mean(rs)

q

Page 16: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

16

Outline• Problem Definition• Motivation• Neighborhood formation• Anomaly detection• Experiments• Related work• Conclusion and future work

Page 17: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

17

Datasets

datasets |V1| |V2| |E| Avgdeg(V1) Avgdeg(V2)

Conference-Author (CA)

2687

288K 662K

510 5

Author-Paper (AP)

316K

472K 1M 3 2

IMDB 553K

204k 2.2M 4 11

Page 18: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

18

Goals

[Q1]: Do the neighborhoods make sense? (NF)

[Q2]: How accurate is the approximate NF?

[Q3]: Do the anomalies make sense? (AD)[Q4]: What about the computational cost?

Page 19: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

19

[Q1] Exact NF

• The nodes (x-axis) with the highest relevance scores (y-axis) are indeed very relevant to the query node.

• The relevance scores can quantify how close/related the node is to the query node.

rele

van

ce s

core

most relevant neighbors

rele

van

ce s

core

most relevant neighbors

ICDM (CA) Robert DeNiro (IMDB)

Page 20: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

20

[Q2] Approximate NF

• Precision = fraction of overlaps between ApprNF and NF among top k neighbors

• The precision drops slowly while increasing the number of partition

• The precision remain high for a wide range of neighborhood size

neighborhood size = 20 num of partitions = 10

# of partitions

Pre

cisi

on

Pre

cisi

on

neighborhood size

Page 21: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

21

[Q3] Anomaly detection

• Randomly inject some nodes and edges (biased towards high-degree nodes)

• The genuine ones on average have high normality score than the injected ones

nor

mal

ity

scor

e

Page 22: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

22

[Q4] Computational cost

• Even with a small number of partitions, the computational cost can be reduced dramatically.

Approximate NF

Tim

e(se

c)

# of Partitions

Page 23: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

23

Related Work• Random walk

[Brin & Page98] [Haveliwala WWW02] • Graph partitioning

[Karypis and Kumar98] [Kannan et al. FOCS00]• Collaborative filtering

[Shardanand&Maes95] …• Anomaly detection

[Aggarwal&Yu. SIMOD01] [Noble&Cook KDD03] [Newman03]

Page 24: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

24

Conclusion• Two important queries on bipartite

graphs: NF and AD• An efficient method for NF using random-

walk with restart and graph partitioning techniques

• Based the result of NF, we can also spot anomalies (AD)

• Effectiveness is confirmed on real datasets

Page 25: Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun

25

Future work and Q & A• Future work

• What about time-evolving graphs?

• Contact:Jimeng [email protected]://www.cs.cmu.edu/~jimeng