networks all around us: extracting networks from your problem domain
TRANSCRIPT
![Page 1: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/1.jpg)
D ATA D AY T E X A S
Networks All Around Us: Discovering Networks in your Domain | 1/5/2015
Russell Jurney
![Page 2: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/2.jpg)
RELATO MAPS
MARKET
![Page 3: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/3.jpg)
BACKGROUND
Serial Entrepreneur Contributed code to Apache Druid, Apache Pig, Apache DataFu, Apache Whirr, Azkaban, MongoDB
Apache Commi?er
Three-Bme O'Reilly Author Started & Shipped Product at E8 Security
Ning, LinkedIn, Hortonworks veteran
![Page 4: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/4.jpg)
2009 2010 2011
2012 2014
EXAMPLES OF NETWORKS
![Page 5: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/5.jpg)
FOUNDER
NETWORKS
node = company edge = employment transition as in people who… …worked at one startup, founded another
![Page 6: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/6.jpg)
WEBSITE
BEHAVIOR
node = web page edge = user browses one page, then another
![Page 7: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/7.jpg)
ONLINE SOCIAL
NETWORKS
node = linkedin profile, edge = linked connection
![Page 8: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/8.jpg)
EMAIL INBOX
node = email address, edge = sent email
![Page 9: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/9.jpg)
MARKETS
node = company, edge = partnership
![Page 10: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/10.jpg)
TYPES OF NETWORKS
![Page 11: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/11.jpg)
TINKERPOP
“Marko Rodriguez is the Doug Cutting of graph analytics.” —Mark Twain
![Page 12: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/12.jpg)
PROPERTY
GRAPHS
![Page 13: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/13.jpg)
A PROPERTY GRAPH IN
EVERY DATABASE
![Page 14: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/14.jpg)
PROPERTY GRAPHS IN YOUR DOMAIN
identify entities identify relationships specify schema (or not) populate graph database learn to think in graph walks query in batch query in realtime
![Page 15: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/15.jpg)
POPULATING A PROPERTY GRAPH
// Add nodes while((json = company_reader.readLine()) != null) { document = jsonSlurper.parseText(json) v = graph.addVertex('company') v.property("_id", document._id) v.property("domain", document.domain) v.property("name", document.name) }
![Page 16: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/16.jpg)
POPULATING A PROPERTY GRAPH
// Get a graph traverser g = graph.traversal()
while((json = links_reader.readLine()) != null) { document = jsonSlurper.parseText(json)
// Add edges to graph v1 = g.V().has('domain', document.home_domain).next() v2 = g.V().has('domain', document.link_domain).next() v1.addEdge(document.type, v2) }
![Page 17: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/17.jpg)
TOOLS OF
SNA
SNA = Social Network Analysis
centrality clustering block models cores dispersion
![Page 18: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/18.jpg)
CENTRALITY
Centrality is a way of measuring how central or important a particular node is in a social network.
OR
What nodes should I care about?
![Page 19: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/19.jpg)
SINGLE-RELATIONAL CENTRALITY(S)
# all-links-the-same-type-centrality g.V().out().groupCount()
# things-humans-walk-centrality g.V().hasLabel(‘human’).out(‘walks’).groupCount()
# things-dogs-eat-centrality g.V().hasLabel(‘dog’).out(‘eats’).groupCount()
![Page 20: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/20.jpg)
MULTI-RELATIONAL CENTRALITY(S)
# things-eaten-by-things-humans-walk-centrality g.V().hasLabel(‘human’).out(‘walks’).out(‘eats’).groupCount()
# things-hated-by-things-humans-pet-centrality g.V().hasLabel(‘human’).out(‘pets’).out(‘hates’).groupCount()
# things-that-pet-things-that-eat-mice-centrality g.V().in(‘eats’).in(‘pets’).groupCount()
![Page 21: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/21.jpg)
CENTRALITIES
degree centrality closeness centrality
betweenness centrality eigenvector centrality
![Page 22: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/22.jpg)
DEGREE CENTRALITY
in-degree centrality is nice… it works even if you’re missing a node’s outbound links
![Page 23: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/23.jpg)
DEGREE CENTRALITY
# computation count connections …its that simple in-degree centrality = popularity out-degree centrality = gregariousness
# meaning risk of catching cold
![Page 24: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/24.jpg)
CLOSENESS CENTRALITY
# computation count hops of all shortest paths distance from all other nodes reciprocal of farness
# meaning communication efficiency spread of information
![Page 25: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/25.jpg)
BETWEENNESS CENTRALITY
# computation count of times node appears in shortest paths… …between all pairs of nodes
# meaning control of communication between other nodes
![Page 26: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/26.jpg)
EIGENVECTOR CENTRALITY
# computation counts connections of connected nodes more connected neighbors matter more
# meaning influence of one node on others pagerank is an eigenvector centrality
![Page 27: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/27.jpg)
CLUSTERING
![Page 28: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/28.jpg)
CLUSTERING
property based clustering: k-meansgraph based clustering: modularity property graph based clustering: CESNA
![Page 29: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/29.jpg)
BLOCK MODELS
how much do clusters connect? are links reciprocal? circos are helpful
![Page 30: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/30.jpg)
CORES
![Page 31: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/31.jpg)
DISPERSION
Romantic Partnerships and the Dispersion of Social Ties: A Network Analysis of Relationship Status on Facebook
![Page 32: Networks All Around Us: Extracting networks from your problem domain](https://reader031.vdocuments.us/reader031/viewer/2022030211/58a2ce3f1a28ab692e8b4819/html5/thumbnails/32.jpg)
Russell Jurney, CEO [email protected] twi?er.com/rjurney 404-317-3620