large scale social networks analysis joclad 2013
DESCRIPTION
TRANSCRIPT
![Page 1: Large scale social networks analysis joclad 2013](https://reader036.vdocuments.us/reader036/viewer/2022082804/54620e58af7959422a8b4b91/html5/thumbnails/1.jpg)
Large Scale Social Networks Analysis – LS SNA
Rui Sarmento João Gama Tiago Cunha Albert Bifet
LIAAD/INESC TECFEP - University of Porto
April 13, 2013
![Page 2: Large scale social networks analysis joclad 2013](https://reader036.vdocuments.us/reader036/viewer/2022082804/54620e58af7959422a8b4b91/html5/thumbnails/2.jpg)
Outline – LS SNA 2/19
1. Motivation2. Software Tools
– State of the art – Recent Evolution– PEGASUS– Graphlab– Snap (Stanford Network Analysis Platform)– Other Tools
3. Case Study– Network of companies and financial organizations– Some Numbers– Algorithms and Used tools– Processing Time
4. Summary & Conclusions
![Page 3: Large scale social networks analysis joclad 2013](https://reader036.vdocuments.us/reader036/viewer/2022082804/54620e58af7959422a8b4b91/html5/thumbnails/3.jpg)
Generic Problem: Nowadays, the huge amounts of data available pose problems for analysis with regular hardware and/or software.
Example Facts:“We have produced more data in the last two years than in all of prior history so we are witnessing a Big Bang of Data” – Tim McGuire, Mckinsey
1.Motivation – LS SNA 3/19
![Page 4: Large scale social networks analysis joclad 2013](https://reader036.vdocuments.us/reader036/viewer/2022082804/54620e58af7959422a8b4b91/html5/thumbnails/4.jpg)
1.Motivation – LS SNA 4/19
Solution:Emerging technologies, like modern models for parallel computing, multicore computers or even clusters of computers, can be very useful for analyzing massive network data.
![Page 5: Large scale social networks analysis joclad 2013](https://reader036.vdocuments.us/reader036/viewer/2022082804/54620e58af7959422a8b4b91/html5/thumbnails/5.jpg)
Particular case Study:CrunchBase database (accessed May 2012)
• Network A of companies and financial organizations/funds, e.g:
» Company Y has connection to investment fund X
• Network B of persons and companies e.g.:
» Person A has connection to company Y
1.Motivation – LS SNA 5/19
Y X
A Y
![Page 6: Large scale social networks analysis joclad 2013](https://reader036.vdocuments.us/reader036/viewer/2022082804/54620e58af7959422a8b4b91/html5/thumbnails/6.jpg)
What can we do? - we want to analyze entities behavior in terms of
relationships, or other influences. - we want to determine some characteristic of the
network from the point of view of the self-centered and the network as a whole.
What is the problem? - Takes too much time (many hours or even days) to
do it with normal software like Gephi or R even with a good PC
1.Motivation – LS SNA 6/19
![Page 7: Large scale social networks analysis joclad 2013](https://reader036.vdocuments.us/reader036/viewer/2022082804/54620e58af7959422a8b4b91/html5/thumbnails/7.jpg)
• State of the art – Recent Evolution2001 – Boost Graph Library (C++)2005 – Parallel BGL (C++), Hadoop (Java)2007 – Development of Graphlab Starts2008 – SNAP Small-world Network Analysis and
Partitioning (C, openMP)..
2013 – Several Graph Frameworks using Hadoop and/or HDFS
2. Software Tools – LS SNA 7/19
![Page 8: Large scale social networks analysis joclad 2013](https://reader036.vdocuments.us/reader036/viewer/2022082804/54620e58af7959422a8b4b91/html5/thumbnails/8.jpg)
• PEGASUS– Computation framework written in JAVA– Is an open-source, graph-mining system with
massive scalability– Dependent of Hadoop– Graph Oriented Tool
2. Software Tools – LS SNA 8/19
![Page 9: Large scale social networks analysis joclad 2013](https://reader036.vdocuments.us/reader036/viewer/2022082804/54620e58af7959422a8b4b91/html5/thumbnails/9.jpg)
• Graphlab API– Computation framework written in C++ – Computation in GraphLab is applied to dependent records
which are stored as vertices in a large distributed data-graph– Computation in GraphLab is expressed as vertex-programs
which are executed in parallel on each vertex and can interact with neighboring vertices.
– GraphLab programs interact by directly reading the state of neighboring vertices and by modifying the state of adjacent edges.
– HDFS Integration: Access your data directly from HDFS
2. Software Tools – LS SNA 9/19
![Page 10: Large scale social networks analysis joclad 2013](https://reader036.vdocuments.us/reader036/viewer/2022082804/54620e58af7959422a8b4b91/html5/thumbnails/10.jpg)
• Snap (Stanford Network Analysis Platform)– Not Parallel however…– SNAP library is written in C++ and optimized for
maximum performance and compact graph representation
– It easily scales to massive networks with hundreds of millions of nodes, and billions of edges
– …although some algorithms in Snap might be slow due to complexity
2. Software Tools – LS SNA 10/19
![Page 11: Large scale social networks analysis joclad 2013](https://reader036.vdocuments.us/reader036/viewer/2022082804/54620e58af7959422a8b4b91/html5/thumbnails/11.jpg)
• Other Tools (Resuming)– Several more tools available:
• Giraph – graph oriented• Rhadoop (Package for R and Hadoop) – generic tool
=> All previous tools dependant of Hadoop which seems to be more and more commonly adopted
2. Software Tools – LS SNA 11/19
![Page 12: Large scale social networks analysis joclad 2013](https://reader036.vdocuments.us/reader036/viewer/2022082804/54620e58af7959422a8b4b91/html5/thumbnails/12.jpg)
Software Pegasus Graphlab Snap
Algorithms available from software install(graph analysis)
Degree PageRank Random Walk
with Restart (RWR)
Radius Connected
Components
approximate diameter
kcore pagerank connected
component simple coloring directed triangle
count format convert sssp undirected triangle
count
Cascades Centrality Cliques Community Concomp Forestfire Graphgen Graphhash Kcores Kronem Krongen Kronfit Maggen Magfit Motifs Ncpplot Netevol Netinf Netstat Mkdatasets infopath
2. Software Tools – LS SNA 12/19
![Page 13: Large scale social networks analysis joclad 2013](https://reader036.vdocuments.us/reader036/viewer/2022082804/54620e58af7959422a8b4b91/html5/thumbnails/13.jpg)
3. Case Study – LS SNA 13/19
=> Some Numbers• Network of companies and financial organizations/funds
1. Number of firms: 88,2692. Number of investment funds: 7697
• Network of persons and companies1. Number of persons: 118,394
![Page 14: Large scale social networks analysis joclad 2013](https://reader036.vdocuments.us/reader036/viewer/2022082804/54620e58af7959422a8b4b91/html5/thumbnails/14.jpg)
=> Algorithms and Used tools– Node Degree with PEGASUS– Friends of Friends with Hadoop Map-Reduce– Centrality Measures with Snap (Stanford Network
Analysis Platform)– Triangles Counting with Graphlab
3. Case Study – LS SNA 14/19
![Page 15: Large scale social networks analysis joclad 2013](https://reader036.vdocuments.us/reader036/viewer/2022082804/54620e58af7959422a8b4b91/html5/thumbnails/15.jpg)
=> Processing Time
3. Case Study – LS SNA 15/19
![Page 16: Large scale social networks analysis joclad 2013](https://reader036.vdocuments.us/reader036/viewer/2022082804/54620e58af7959422a8b4b91/html5/thumbnails/16.jpg)
4. Summary & Conclusions LS SNA 16/19
• Summary & Conclusions– This paper resumes which tools to look for when
dealing with big graphs studies.– We are witnesses of a big proliferation of software
tools aimed at the analysis of big scale graphs.– What was once a problem to deal with these
networks is solved with the right tools
![Page 17: Large scale social networks analysis joclad 2013](https://reader036.vdocuments.us/reader036/viewer/2022082804/54620e58af7959422a8b4b91/html5/thumbnails/17.jpg)
References I – LS SNA 17/19
• APACHE. 2012. Apache Giraph [Online]. The Apache Software Foundation. Available: http://incubator.apache.org/giraph/.
• GRAPHLAB. Graphlab The Abstraction [Online]. Available: http://graphlab.org/home/abstraction/ 2012].
• GRAPHLAB. 2012. Graph Analytics Toolkit [Online]. Available: http://graphlab.org/toolkits/graph-analytics/ 2012].
• HOLMES, A. 2012. Hadoop In Practice, Manning.• LESKOVEC, J. Stanford Network Analysis Platform [Online]. Available:
http://snap.stanford.edu/snap/ [Accessed 12-2012 2012].• MAZZA, G. 2012. FrontPage - Hadoop Wiki [Online]. Available:
http://wiki.apache.org/lucene-hadoop/ [Accessed 11-2012.• THANEDAR, V. 2012. API Documentation [Online]. Available:
http://developer.crunchbase.com/docs [Accessed 04-2012 2012].
![Page 18: Large scale social networks analysis joclad 2013](https://reader036.vdocuments.us/reader036/viewer/2022082804/54620e58af7959422a8b4b91/html5/thumbnails/18.jpg)
References II – LS SNA 18/19• UNIVERSITY, C. M. 2012. Project Pegasus [Online]. Available:
http://www.cs.cmu.edu/~pegasus/ 2012].• WASHINGTON, U. O. What is Hadoop? [Online]. Available:
http://escience.washington.edu/get-help-now/what-hadoop [Accessed 05-03-2013 2013].
• OWENS, J. R. 2013. Hadoop Real-World Solutions Cookbook. PACKT Publishing.
• HOLMES, A. 2012. Hadoop In Practice, Manning.• McGuire, T. Big Data Better Decisions [Online]. Available:
http://www.slideshare.net/McK_CMSOForum/big-data-and-advanced-analytics [Accessed 05-03-2013 2013].
![Page 19: Large scale social networks analysis joclad 2013](https://reader036.vdocuments.us/reader036/viewer/2022082804/54620e58af7959422a8b4b91/html5/thumbnails/19.jpg)
END – LS SNA 19/19
Thank You!Questions?