social network analysis in your problem domain
TRANSCRIPT
![Page 1: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/1.jpg)
G R A P H D AY T E X A S TA L K
Networks All Around Us: Discovering Networks in your Domain | 1/5/2015
Russell Jurney
http://bit.ly/socialnetworkanalysis
![Page 2: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/2.jpg)
RELATO MAPS
MARKET
![Page 3: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/3.jpg)
BACKGROUND
Serial Entrepreneur Contributed code to Apache Druid, Apache Pig, Apache DataFu, Apache Whirr, Azkaban, MongoDB
Apache Commi?er
Three-Bme O'Reilly Author Started & Shipped Product at E8 Security
Ning, LinkedIn, Hortonworks veteran
![Page 4: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/4.jpg)
2009 2010 2011
2012 2014
EXAMPLES OF NETWORKS
![Page 5: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/5.jpg)
FOUNDER
NETWORKS
node = company edge = employment transition as in people who… …worked at one startup, founded another
![Page 6: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/6.jpg)
WEBSITE
BEHAVIOR
node = web page edge = user browses one page, then another
![Page 7: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/7.jpg)
ONLINE SOCIAL
NETWORKS
node = linkedin profile, edge = linked connection
![Page 8: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/8.jpg)
EMAIL INBOX
node = email address, edge = sent email
![Page 9: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/9.jpg)
MARKETS
node = company, edge = partnership
![Page 10: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/10.jpg)
MARKET REPORTS
![Page 11: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/11.jpg)
TYPES OF NETWORKS
![Page 12: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/12.jpg)
TINKERPOP
“Marko Rodriguez is the Doug Cutting of graph analytics.” —Mark Twain
![Page 13: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/13.jpg)
PROPERTY
GRAPHS
![Page 14: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/14.jpg)
MULTI RELATIONAL TO SINGLE
RELATIONAL
g.E(‘friend’).subgraph()
![Page 15: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/15.jpg)
final Graph g = TinkerFactory.createClassic(); try (final OutputStream os = new FileOutputStream(“jsondump/links.json")) { GraphSONWriter.build().create().writeGraph(os, g); }
EXPORT LINKS AS JSON
![Page 16: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/16.jpg)
THEN USE SNA
LIBRARIES
# # Example - calculate friendship dispersion #
di_graph = nx.DiGraph()
all_edges = util.json_cr_file_2_array('jsondump/links.json')
for edge in all_edges: if 'type' in edge and edge['type'] == 'partnership': di_graph.add_edge(edge['domain1'], edge[‘domain2'])
dispersion = nx.dispersion(di_graph)
![Page 17: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/17.jpg)
A PROPERTY GRAPH IN
EVERY DATABASE
![Page 18: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/18.jpg)
PROPERTY GRAPHS IN YOUR DOMAIN
identify entities identify relationships specify schema (or not) populate graph database learn to think in graph walks (hard) query in batch query in realtime
![Page 19: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/19.jpg)
POPULATING A PROPERTY GRAPH
// Add nodes while((json = company_reader.readLine()) != null) { document = jsonSlurper.parseText(json) v = graph.addVertex('company') v.property("_id", document._id) v.property("domain", document.domain) v.property("name", document.name) }
![Page 20: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/20.jpg)
POPULATING A PROPERTY GRAPH
// Get a graph traverser g = graph.traversal()
while((json = links_reader.readLine()) != null) { document = jsonSlurper.parseText(json)
// Add edges to graph v1 = g.V().has('domain', document.home_domain).next() v2 = g.V().has('domain', document.link_domain).next() v1.addEdge(document.type, v2) }
![Page 21: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/21.jpg)
TOOLS OF
SNA
SNA = Social Network Analysis
centrality clustering block models cores dispersion center-pieces
![Page 22: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/22.jpg)
CENTRALITY
Centrality is a way of measuring how central or important a particular node is in a social network.
OR
What nodes should I care about?
![Page 23: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/23.jpg)
SINGLE-RELATIONAL CENTRALITY(S)
# all-links-the-same-type-centrality g.V().out().groupCount()
# things-humans-walk-centrality g.V().hasLabel(‘human’).out(‘walks’).groupCount()
# things-dogs-eat-centrality g.V().hasLabel(‘dog’).out(‘eats’).groupCount()
![Page 24: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/24.jpg)
MULTI-RELATIONAL CENTRALITY(S)
# things-eaten-by-things-humans-walk-centrality g.V().hasLabel(‘human’).out(‘walks’).out(‘eats’).groupCount()
# things-hated-by-things-humans-pet-centrality g.V().hasLabel(‘human’).out(‘pets’).out(‘hates’).groupCount()
# things-that-pet-things-that-eat-mice-centrality g.V().in(‘eats’).in(‘pets’).groupCount()
![Page 25: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/25.jpg)
CENTRALITIES
degree centrality closeness centrality
betweenness centrality eigenvector centrality
![Page 26: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/26.jpg)
DEGREE CENTRALITY
in-degree centrality is nice… it works even if you’re missing a node’s outbound links
![Page 27: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/27.jpg)
DEGREE CENTRALITY
# computation count connections …its that simple in-degree centrality = popularity out-degree centrality = gregariousness
# meaning risk of catching cold
![Page 28: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/28.jpg)
DEGREE CENTRALITY IN GREMLIN
# all-links-the-same-type-centrality g.V().out().groupCount()
![Page 29: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/29.jpg)
CLOSENESS CENTRALITY
# computation count hops of all shortest paths distance from all other nodes reciprocal of farness
# meaning communication efficiency spread of information
![Page 30: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/30.jpg)
CLOSENESS CENTRALITY IN GREMLIN
closenessCentrality = g.V().as(“a”).repeat(both(‘relationship_type').simplePath()).emit().as("b")
.dedup().by(select(“a","b")).path() .group().by(limit(local, 1)).by(count(local)
.map {1/it.get()}.sum())
![Page 31: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/31.jpg)
BETWEENNESS CENTRALITY
# computation count of times node appears in shortest paths… …between all pairs of nodes
# meaning control of communication between other nodes
![Page 32: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/32.jpg)
EIGENVECTOR CENTRALITY
# computation counts connections of connected nodes more connected neighbors matter more
# meaning influence of one node on others pagerank is an eigenvector centrality
![Page 33: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/33.jpg)
EIGENVECTOR CENTRALITY IN GREMLIN
g.V() .repeat(out(‘relationship_type’).groupCount(‘m').by('unique_key'))
.times(n).cap('m')
![Page 34: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/34.jpg)
CLUSTERING
![Page 35: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/35.jpg)
CLUSTERING
property based clustering: k-meansgraph based clustering: modularity property graph based clustering: CESNA
![Page 36: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/36.jpg)
BLOCK MODELS
how much do clusters connect? are links reciprocal? circos are helpful
![Page 37: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/37.jpg)
CORES
![Page 38: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/38.jpg)
DISPERSION
Romantic Partnerships and the Dispersion of Social Ties: A Network Analysis of Relationship Status on Facebook
![Page 39: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/39.jpg)
CENTER-PIECE SUBGRAPHS
*Slide stolen from Tong, Faloutsos, Pan
![Page 40: Social Network Analysis in Your Problem Domain](https://reader031.vdocuments.us/reader031/viewer/2022030402/58a2ce3f1a28ab692e8b4817/html5/thumbnails/40.jpg)
Russell Jurney, CEO [email protected] twi?er.com/rjurney 404-317-3620
http://bit.ly/socialnetworkanalysis