automated social hierarchy detection through email network analysis (snakdd07) ryan rowe, germ´an...

18
Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2008/12/11

Upload: sharleen-holland

Post on 12-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:

Automated Social Hierarchy Detection through Email Network

Analysis(SNAKDD07)

Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo

1

Advisor: Dr. Koh Jia-LingReporter: Che-Wei, Liang

Date: 2008/12/11

Page 2: Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:

Outline

• Introduction• SNA algorithm• Results and Discussion• Conclusions and Future Work

2

Page 3: Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:

Introduction

• The recent bankruptcy scandals in US companies such as Enron and WorldCom have increased the need to analyze electronic information– In order to define risk and identify any conflict of interest

among the entities of a corporate household

• Identifying the relationships between entities, or corporate hierarchy is not a straightforward task– Can be extracted by analyzing the email communication

data

3

Page 4: Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:

SNA Algorithm

• For each mail user– Analyze and calculate several statistics for each

feature of each user

• Construct an email network graph– Vertices represent accounts, edges represent

communication between two accounts– Analysis cliques and other graph theoretical qualities– Combined to Social score

4

Page 5: Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:

SNA Algorithm

• Two sets of statistics about user’s “importance”– Average response time

• The average time elapsed between a user sending an email and later receiving an email from that same user

• Considered a “response” if a received mail succeeds a sent mail within three days

– Cliques(maximal complete subgraphs)• find all cliques in a graph• Assumptions: users associated with a larger set and

frequency of cliques will be ranked higher

5

Page 6: Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:

Cliques

6

Page 7: Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:

Communication Networks

• Number of cliques– The number of cliques that the account is contained within

• Raw clique score– A score computed using the size of clique set

• Weighted clique score– A score computed using the “importance” of the people in

each clique

7

Page 8: Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:

Communication Networks

• Degree centrality– Deg(vi) = ∑ j aij (aij entry of adjacent matrix A of G)

• Clustering coefficient– how close the vertex and its neighbors are to

being a clique

8

Page 9: Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:

Communication Networks

• Mean of shortest path length from a specific vertex to all vertices in the graph G–

where dij D, D is the geodesic distance matrix of G

• Betweeness centrality– Proportion of all geodesic distances of all other

vertex that include vertex vi

9

Page 10: Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:

Communication Networks

• “Hubs-and-authorities” importance– Calculates the “hubs-and-authorities” importance

of each vertex• J. Kleinberg. Authoritative sources in a hyperlinked

environment. Journal of the ACM, 46, 1999.

10

Page 11: Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:

Social Score

• Social score– Rank users from most important to least important– Group users which have similar social scores and

clique connectivity– Determine n different levels of social hierarchy within

which to place all the users

11

Page 12: Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:

Compute Social Score

• Scale and normalize each statistics

• Social score– A score between 0 and 100

12

Page 13: Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:

Results and Discussion

• Using EMT– Java based email analysis engine built on a

database back-end– JUNG library is used for the degree and centrality

measures

• Present the analysis of the North American West Power Traders division of Enron Corporation

13

Page 14: Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:

14

Page 15: Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:

15

Page 16: Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:

16

Page 17: Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:

Conclusions and Future Work

• Enron dataset provides an excellent starting point of real world data

• By varying the feature weights, it is possible to– Pick out the most important individual– Group individuals with similar social qualities– Graphically draw an organization chart which

approximately simulates the real social hierarchy

17

Page 18: Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:

Conclusions and Future Work

• The concept of average response time can be reworked by considering the order of response

• Consider common email usage times for each user and to adjust the received time of email

• New grouping and division algorithms are being considered

• Graph edges should be considered into arrange users into different level

18