finding insights in connected data: using graph databases in journalism

Post on 16-Apr-2017

1.039 Views

Category:

Software

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Finding Insights in Connected DataGraph Databases in Journalism

NICAR 2016Denver

William Lyon@lyonwj

About

Software Developer @Neo4jwill@neo4j.com

@lyonwjlyonwj.com

William Lyon

Agenda

• What is a graph database?• Why graphs in journalism?

• Demo1: Graphing US Congress

• Demo2: Hillary email dataset

What is a graph?

Chart

Chart Graph

VIEW

ED

VIEWED

BOUG

HTVIEW

ED BOUGHT

BOUGHT

BO

UG

HT

BOUG

HT

MANAGE

MANAGE

LEADS

REGION

MANAGE

MANAGE

REGION

LEADS

LEADS

COLL

ABO

RAT

ACCOUNT HOLDER 2

ACCOUNT

ACCOUNT

CREDIT CARD

BANKACCOUNT

BANKACCOUNT BANK

ADDRESS

PHONE

PHONE NUMBER

SSN 2

LOAN

SSN 2

UNSECURE LOAN

CREDIT CARD

Graph Databases in Journalism

Graph Databases Software that stores & queries data as a graph.

Graph Database

• Property graph data model• Nodes and relationships

• Native graph processing• Cypher query language

neo4j.com

Why graph databases in journalism?

Why graph databases in journalism?

bills.csv

committees.csv

votes.csv

https://www.govtrack.us/developers

bills.csv

committees.csv

votes.csv

https://www.govtrack.us/developers

SELECT l.name, c.jurisdictionFROM legislators p LEFT JOIN committee c ON c.member_ID=l.thomasIDWHERE c.thomasID = “HSAP”

SQLER Diagrams

Relational Versus Graph Models

Relational Model Graph Model

KNOWS

KNOWS

KNOWS

ANDREAS

TOBIAS

MICA

DELIA

Person FriendPerson-Friend

ANDREASDELIA

TOBIAS

MICA

Graph Database

Relational Database

A way of representing data

Property Graph Model

The Whiteboard Model Is the Physical Model

Property Graph Model Components

Nodes • The objects in the graph • Can have name-value properties • Can be labeled

Relationships • Relate nodes by type and

direction • Can have name-value properties

CAR

DRIVES

name: “Dan” born: May 29, 1970

twitter: “@dan”name: “Ann”

born: Dec 5, 1975

since: Jan 10, 2011

brand: “Volvo” model: “V70”

LOVES

LOVES

LIVES WITH

OWNS

PERSON PERSON

Cypher Query Language

SQL for graphs

Cypher: Powerful and Expressive Query Language

CREATE (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )

LOVES

Dan Ann

LABEL PROPERTY

NODE NODE

LABEL PROPERTY

MATCH (boss)-[:MANAGES*0..3]->(sub), (sub)-[:MANAGES*1..3]->(report)WHERE boss.name = “John Doe”RETURN sub.name AS Subordinate, count(report) AS Total

Express Complex Queries Easily with Cypher

Find all direct reports and how many people they manage, up to 3 levels down

Cypher Query

SQL Query

Graphing US Congress

Demo

https://github.com/legis-graph/legis-graph

https://github.com/legis-graph/legis-graph

LOAD CSV WITH HEADERS FROM “file:///legislators.csv” AS line MERGE (l:Legislator (thomasID: line.thomasID}) SET l = line MERGE (s:State {code:line.state})<-[:REPRESENTS]-(l) …

US Congress

https://github.com/legis-graph/legis-graph

http://legis-graph.github.io/legis-graph-spatial/

contributions

committees

candidates

https://gist.github.com/johnymontana/02ae47fc0a29719db045

+

https://gist.github.com/johnymontana/02ae47fc0a29719db045

Graph data models are easy to evolve!Takeaway

Hillary Clinton EmailsDemo

Clinton email graph model

Data munging

http://graphics.wsj.com/hillary-clinton-email-documents/

Data munging

https://github.com/OpenRefine/OpenRefine/wiki/Faceting

LOAD CSV - Cypher

http://www.developeradvocate.com/2015/11/graphing-hillary-clinton-email/

Clinton email graph model

bit.ly/1R1ybyu

Content mining

“Networks give structure to the conversation while content mining gives meaning.”

http://breakthroughanalysis.com/2015/10/08/ltapreriitsouda/

- Preriit Souda

Extracting topics from email text

Extracting topics from email text

http://www.markhneedham.com/blog/2015/02/13/neo4j-building-a-topic-graph-with-prismatic-interest-graph-api/

Clinton email graph model

Clinton email graph model

http://bit.ly/1R1ybyu

Resources

Visualization

https://linkurio.us/

http://visjs.org/

http://neo4j.com/developer/guide-data-visualization/

Data analysis with Neo4j

Py2neo http://py2neo.org/2.0/

IPython Notebook https://github.com/versae/ipython-cypher

R-lang http://neo4j.com/developer/r/

ICIJ Case StudySwiss Leaks

https://youtu.be/4__ni4aC8gI http://neo4j.com/case-studies/icij/

graphdatabases.com

top related