non-negative residual matrix factorization w/ application to graph anomaly detection

Post on 08-Jan-2016

37 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection. Hanghang Tong and Ching-Yung Lin. April 28-30, 2011. Large Graphs are Everywhere!. -----. Q: How to find patterns? e.g., community, anomaly, etc. Terrorist Network [Krebs 2002]. Food Web [2007]. - PowerPoint PPT Presentation

TRANSCRIPT

© 2011 IBM Corporation

IBM Research

SIAM-DM 2011, Mesa AZ, USA,

Non-Negative Residual Matrix Factorization w/ Application to Graph Anomaly Detection

Hanghang Tong and Ching-Yung Lin

April 28-30, 2011

IBM Research

© 2011 IBM Corporation

Large Graphs are Everywhere!

2

-----

Internet Map [Koren 2009] Food Web [2007]

Protein Network [Salthe 2004]

Social Network [Newman 2005] Web Graph

Terrorist Network [Krebs 2002]

Q: How to find patterns?e.g., community, anomaly, etc.

IBM Research

© 2011 IBM Corporation

A Typical Procedure:

Matrix Tool for Finding Graph Patterns

Graph Adj. Matrix A A = F x G + R

Low-rank matrices Residual matrix

3

IBM Research

© 2011 IBM Corporation

A Typical Procedure:

Matrix Tool for Finding Graph Patterns

Graph Adj. Matrix A A = F x G + R

community anomalies

4

An Illustrative Example

Low-rank matrices Residual matrix

IBM Research

© 2011 IBM Corporation

A Typical Procedure:

An Example

Improve Interpretation by Non-negativity

Interpretation by Non-negativity

GraphAdjacencyMatrix A

A = F x G + R

community

anomalies

Non-negative Matrix FactorizationF >= 0; G >= 0

(for community detection)

Non-negative Residual Matrix Factorization

R(i,j) >= 0; for A(i,j) > 0(for anomaly detection)

This Paper

5

IBM Research

© 2011 IBM Corporation

Anomaly Detection on Graphs

Social Networks– `Popularity contest’

Computer Networks– Spammer, Port Scanner, Vulnerable Machines, etc

Financial Transaction Networks– Fraud transaction (e.g., money-laundry ring), scammer

Criminal Networks– New criminal trend

Tele-communication Networks– Tele-marketer

6

Key Observation: Abnormal Behavior Actual Activities

IBM Research

© 2011 IBM Corporation

Optimization Formulation

General Case

8

Weighted Frobenius Form

WeightCommon in Any Matrix Factorization

IBM Research

© 2011 IBM Corporation

Optimization Formulation

General Case

9

Non-negative residual

Weighted Frobenius Form

WeightCommon in Any Matrix Factorization

Unique in This Paper

IBM Research

© 2011 IBM Corporation

Optimization Formulation

0/1 Weight Matrix (Major Focus of the Paper)

10

Non-negative residual

Common in Any Matrix Factorization

Unique in This Paper

0/1weight

IBM Research

© 2011 IBM Corporation

Optimization Formulation with 0/1 Weight Matrix

NrMF with 0/1 Weight Matrix

Q: How to find ‘optimal’ F and G? – D1: Quality C1: non-convexity of opt. objective

– D2: Scalability C2: large size of the graph

11

IBM Research

© 2011 IBM Corporation

Optimization Method: Batch Mode

Basic Idea 1: Alternating

Basic Idea 2: Separation

12

Not convex wrt F and G, jointlyBut convex if fixing either F or G

argminG

s.t..

argminG

s.t..

For each j i,

Standard Quadratic Programming Prob.

Overall Complexity: Polynomial Can we do better?

IBM Research

© 2011 IBM Corporation

Optimization Method: Incremental Mode

Basic Idea 1: Recursive Basic Idea 2: Alternating Basic Idea 3: Separation

13

Overall Complexity: Linear wrt # of edges

QP for a single variable w/ boundary constrains

Adjacency MatrixA

Initialize: R=A

Rank-1 Approximation

Update Residual Matrix R

Output Final Residual Matrix

Do r times

Can be solved in constant time

IBM Research

© 2011 IBM Corporation

Experimental Evaluation

Effectiveness

Anomaly Type

Accuracy Wall-clock Time

# of edges

14

Efficiency

IBM Research

© 2011 IBM Corporation

Batch Method vs. Incremental Method

Log Wall-clock time (sec.)

Data SetIncremental Method

Batch Method

16

IBM Research

© 2011 IBM Corporation

Conclusion

Problem Formulation: Non-negative Residual Matrix Factorization– a new matrix factorization for interpretable graph anomaly detection

Optimization Methods– Batch: straight-forward, polynomial time complexity

– Incremental: linear time complexity

Future Work– Other interpretable properties (sparseness) for anomaly detection

– Matrix Factorization w/ Total Non-negativity

17

IBM Research

© 2011 IBM Corporation

Thank you!

htong@us.ibm.com(We are hiring at IBM Research!)

18

IBM Research

© 2011 IBM Corporation

Visual Comparison

19

IBM Research

© 2011 IBM Corporation

low q up q low up

top related