large scale social networks analysis joclad 2013

19
Large Scale Social Networks Analysis – LS SNA Rui Sarmento João Gama Tiago Cunha Albert Bifet LIAAD/INESC TEC FEP - University of Porto April 13, 2013

Upload: rui-sarmento

Post on 11-Nov-2014

726 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Large scale social networks analysis   joclad 2013

Large Scale Social Networks Analysis – LS SNA

Rui Sarmento João Gama Tiago Cunha Albert Bifet

LIAAD/INESC TECFEP - University of Porto

April 13, 2013

Page 2: Large scale social networks analysis   joclad 2013

Outline – LS SNA 2/19

1. Motivation2. Software Tools

– State of the art – Recent Evolution– PEGASUS– Graphlab– Snap (Stanford Network Analysis Platform)– Other Tools

3. Case Study– Network of companies and financial organizations– Some Numbers– Algorithms and Used tools– Processing Time

4. Summary & Conclusions

Page 3: Large scale social networks analysis   joclad 2013

Generic Problem: Nowadays, the huge amounts of data available pose problems for analysis with regular hardware and/or software.

Example Facts:“We have produced more data in the last two years than in all of prior history so we are witnessing a Big Bang of Data” – Tim McGuire, Mckinsey

1.Motivation – LS SNA 3/19

Page 4: Large scale social networks analysis   joclad 2013

1.Motivation – LS SNA 4/19

Solution:Emerging technologies, like modern models for parallel computing, multicore computers or even clusters of computers, can be very useful for analyzing massive network data.

Page 5: Large scale social networks analysis   joclad 2013

Particular case Study:CrunchBase database (accessed May 2012)

• Network A of companies and financial organizations/funds, e.g:

» Company Y has connection to investment fund X

• Network B of persons and companies e.g.:

» Person A has connection to company Y

1.Motivation – LS SNA 5/19

Y X

A Y

Page 6: Large scale social networks analysis   joclad 2013

What can we do? - we want to analyze entities behavior in terms of

relationships, or other influences. - we want to determine some characteristic of the

network from the point of view of the self-centered and the network as a whole.

What is the problem? - Takes too much time (many hours or even days) to

do it with normal software like Gephi or R even with a good PC

1.Motivation – LS SNA 6/19

Page 7: Large scale social networks analysis   joclad 2013

• State of the art – Recent Evolution2001 – Boost Graph Library (C++)2005 – Parallel BGL (C++), Hadoop (Java)2007 – Development of Graphlab Starts2008 – SNAP Small-world Network Analysis and

Partitioning (C, openMP)..

2013 – Several Graph Frameworks using Hadoop and/or HDFS

2. Software Tools – LS SNA 7/19

Page 8: Large scale social networks analysis   joclad 2013

• PEGASUS– Computation framework written in JAVA– Is an open-source, graph-mining system with

massive scalability– Dependent of Hadoop– Graph Oriented Tool

2. Software Tools – LS SNA 8/19

Page 9: Large scale social networks analysis   joclad 2013

• Graphlab API– Computation framework written in C++ – Computation in GraphLab is applied to dependent records

which are stored as vertices in a large distributed data-graph– Computation in GraphLab is expressed as vertex-programs

which are executed in parallel on each vertex and can interact with neighboring vertices.

– GraphLab programs interact by directly reading the state of neighboring vertices and by modifying the state of adjacent edges.

– HDFS Integration: Access your data directly from HDFS

2. Software Tools – LS SNA 9/19

Page 10: Large scale social networks analysis   joclad 2013

• Snap (Stanford Network Analysis Platform)– Not Parallel however…– SNAP library is written in C++ and optimized for

maximum performance and compact graph representation

– It easily scales to massive networks with hundreds of millions of nodes, and billions of edges

– …although some algorithms in Snap might be slow due to complexity

2. Software Tools – LS SNA 10/19

Page 11: Large scale social networks analysis   joclad 2013

• Other Tools (Resuming)– Several more tools available:

• Giraph – graph oriented• Rhadoop (Package for R and Hadoop) – generic tool

=> All previous tools dependant of Hadoop which seems to be more and more commonly adopted

2. Software Tools – LS SNA 11/19

Page 12: Large scale social networks analysis   joclad 2013

Software Pegasus Graphlab Snap

Algorithms available from software install(graph analysis)

Degree PageRank Random Walk

with Restart (RWR)

Radius Connected

Components

approximate diameter

kcore pagerank connected

component simple coloring directed triangle

count format convert sssp undirected triangle

count

Cascades Centrality Cliques Community Concomp Forestfire Graphgen Graphhash Kcores Kronem Krongen Kronfit Maggen Magfit Motifs Ncpplot Netevol Netinf Netstat Mkdatasets infopath

2. Software Tools – LS SNA 12/19

Page 13: Large scale social networks analysis   joclad 2013

3. Case Study – LS SNA 13/19

=> Some Numbers• Network of companies and financial organizations/funds

1. Number of firms: 88,2692. Number of investment funds: 7697

• Network of persons and companies1. Number of persons: 118,394

Page 14: Large scale social networks analysis   joclad 2013

=> Algorithms and Used tools– Node Degree with PEGASUS– Friends of Friends with Hadoop Map-Reduce– Centrality Measures with Snap (Stanford Network

Analysis Platform)– Triangles Counting with Graphlab

3. Case Study – LS SNA 14/19

Page 15: Large scale social networks analysis   joclad 2013

=> Processing Time

3. Case Study – LS SNA 15/19

Page 16: Large scale social networks analysis   joclad 2013

4. Summary & Conclusions LS SNA 16/19

• Summary & Conclusions– This paper resumes which tools to look for when

dealing with big graphs studies.– We are witnesses of a big proliferation of software

tools aimed at the analysis of big scale graphs.– What was once a problem to deal with these

networks is solved with the right tools

Page 17: Large scale social networks analysis   joclad 2013

References I – LS SNA 17/19

• APACHE. 2012. Apache Giraph [Online]. The Apache Software Foundation. Available: http://incubator.apache.org/giraph/.

• GRAPHLAB. Graphlab The Abstraction [Online]. Available: http://graphlab.org/home/abstraction/ 2012].

• GRAPHLAB. 2012. Graph Analytics Toolkit [Online]. Available: http://graphlab.org/toolkits/graph-analytics/ 2012].

• HOLMES, A. 2012. Hadoop In Practice, Manning.• LESKOVEC, J. Stanford Network Analysis Platform [Online]. Available:

http://snap.stanford.edu/snap/ [Accessed 12-2012 2012].• MAZZA, G. 2012. FrontPage - Hadoop Wiki [Online]. Available:

http://wiki.apache.org/lucene-hadoop/ [Accessed 11-2012.• THANEDAR, V. 2012. API Documentation [Online]. Available:

http://developer.crunchbase.com/docs [Accessed 04-2012 2012].

Page 18: Large scale social networks analysis   joclad 2013

References II – LS SNA 18/19• UNIVERSITY, C. M. 2012. Project Pegasus [Online]. Available:

http://www.cs.cmu.edu/~pegasus/ 2012].• WASHINGTON, U. O. What is Hadoop? [Online]. Available:

http://escience.washington.edu/get-help-now/what-hadoop [Accessed 05-03-2013 2013].

• OWENS, J. R. 2013. Hadoop Real-World Solutions Cookbook. PACKT Publishing.

• HOLMES, A. 2012. Hadoop In Practice, Manning.• McGuire, T. Big Data Better Decisions [Online]. Available:

http://www.slideshare.net/McK_CMSOForum/big-data-and-advanced-analytics [Accessed 05-03-2013 2013].

Page 19: Large scale social networks analysis   joclad 2013

END – LS SNA 19/19

Thank You!Questions?