using a distributed graph database to make sense of disparate data stores
DESCRIPTION
Presented at DataWeek SF Oct 13 Most analytics depend on data-mining and statistical correlation of information held in single data stores. It is generally inefficient to replicate diverse data, which may be stored in enterprise databases or NoSQL "Big Data" repositories and consolidate them using a single database technology. Although federated queries can help with statistical correlation of data values across data stores the technique is not very good at handling the data stored in relationships because the data stores generally have no knowledge of one another. The speaker describes a different approach that uses graph (relationship) analytics to extract structural data from existing repositories, store representations of the nodes and connections in a graph database, then analyze them to extract additional value.TRANSCRIPT
Copyright © Objectivity, Inc. 2013
Using A Distributed Graph Database To Make Sense Of Disparate Data Stores
Leon Guzenda Dataweek
San Francisco – October 2, 2013
Current Big Data Analytics
Graph Analytics
InfiniteGraph
The ETL & Discovery Process
Copyright © Objectivity, Inc. 2013
Objectivity Inc.
• Objectivity, Inc. is headquartered in Sunnyvale, CA.
• Objectivity has over two decades of Big Data and NoSQL experience
• We develop NoSQL platforms for managing and discovering relationships and patterns in complex data:
–Objectivity/DB - an object database that manages localized, centralized or distributed databases
–InfiniteGraph - a massively scalable graph database built on Objectivity/DB that enables organizations to find, store and exploit the relationships in their data Millions of deployments - Our technology is embedded in hundreds of enterprise and government systems and commercial products
Copyright © Objectivity, Inc. 2013
A Typical Objectivity Deployment - Sensor Data Fusion
Network Centric Collaborative Targeting
Copyright © Objectivity, Inc. 2013
A Typical InfiniteGraph Deployment - GraphMyLife
Copyright © Objectivity, Inc. 2013
A Typical “Big Data” Analytics Setup
Data Aggregation and Analytics Applications
Commodity Linux Platforms and/or High Performance Computing Clusters
Structured Semi-Structured Unstructured
Graph DB
Object DB Doc DB K-V Store Hadoop Column
Store Data W/H RDBMS
Copyright © Objectivity, Inc. 2013
Incremental Analytics Improvements Aren’t Enough
All current solutions use the same basic architectural model • None of the popular solutions have an efficient way to store connections
between entities in different silos • Most analytic technology focuses on the content of the data nodes, rather
than the many kinds of connections between the nodes and the data in those connections
• Why? Because traditional and earlier NoSQL solutions are bad at handling
relationships. • Graph databases can efficiently store, manage and query the many kinds of
relationships hidden in the data.
Copyright © Objectivity, Inc. 2013
Graph Analytics
Copyright © Objectivity, Inc. 2013
Graph (Relationship) Analytics... A SQL Shortcoming Think about the SQL query for finding all links between the two “blue” rows... it's hard!!
Table_A Table_B Table_C Table_D Table_E Table_F Table_G
There are some kinds of complex relationship handling problems that SQL wasn't designed for.
Copyright © Objectivity, Inc. 2013
...Graph Analytics
InfiniteGraph - The solution can be found with a few lines of code
A SQL Shortcoming
A3 G4
Table_A Table_B Table_C Table_D Table_E Table_F Table_G
Copyright © Objectivity, Inc. 2013
Applications for Graph Analytics
LOGISTICS HEALTHCARE INFORMATICS
MARKET ANALYSIS SOCIAL NETWORK ANALYSIS
Representing the Graph...
Combatant A
Civilian Q
Situation Y
Civilian P
Bank X
Civilian S
Civilian R
Events/Places People/Orgs Facts
Situation X
The existing COMINT and HUMINT data might look like this:
Target T
Cafe C S Seen Near T A Banks at X
A Called P
A Seen At Y
A Seen Near X P Emailed S
P Called Q Q Seen Near T
P Called R R Seen Near T
X Paid S
A Eats At
Representing the Graph...
Combatant A
Civilian Q
Situation Y
Civilian P
Civilian S
Civilian R
Events/Places People/Orgs Facts
Situation X
Target T
We start by identifying the nodes (Vertices) and the connections (Edges)
NODES CONNECTIONS
S Seen Near T A Banks at X
A Called P
A Seen At Y
A Seen Near X P Emailed S
P Called Q Q Seen Near T
P Called R R Seen Near T
X Paid S Bank X
Cafe C
A Eats At
VERTEX EDGE 2 N
...Representing the Graph..
“Nodes” “Connections”
...Representing the Graph..
Situation X Combatant A Seen Near
Civilian P
Called
Called
Seen At Situation Y
Civilian Q
Target T
Seen Near
Emailed
Banks At
Bank X
Civilian S
Seen Near
Called
Civilian R
Seen Near
Paid
Eats At
Cafe C
VERTEX EDGE “Nodes” “Connections”
...Analyzing the Graph...
Situation X Combatant A Seen Near
Civilian P
Called
Called
Seen At Situation Y
Civilian Q
Target T
Seen Near
Emailed
Banks At
Bank X
Civilian S
Seen Near
Called
Civilian R
Seen Near
Paid
Eats At
Cafe C
...Threat Analysis
Situation X Combatant A Seen Near
Civilian P
Called
Called
Seen At Situation Y
Civilian Q
Target T
Seen Near
Emailed
Banks At
Bank X
Civilian S
Seen Near
Called
Civilian R
Seen Near
Paid
SUSPECTS
NEEDS PROTECTION
Copyright © Objectivity, Inc. 2013
Visual Analytics
Copyright © Objectivity, Inc. 2013
Graphs Can Scale Very Quickly
We often hear about the “trillion row” database. Amazon S3 has reached 2 trillion, but one Objectivity site:
• Processes 10s of trillions of objects per day
• Supports over 1000 analysts around the clock.
Consider a graph where each node has 10 connections:
• At 6 degrees of freedom, finding a path between two nodes may require traversing a million links.
• 9 degrees of freedom requires a billion traversals
• 12 degrees of freedom requires a trillion traversals
• 15 degrees of freedom requires a quadrillion traversals...
Copyright © Objectivity, Inc. 2013
THE ETL & DISCOVERY PROCESS
Copyright © Objectivity, Inc. 2013
Not Only SQL – A group of 4 primary technologies
Simple Highly Interconnected
Copyright © Objectivity, Inc. 2013
• A high performance distributed database engine that supports analyst-time decision support and actionable intelligence
• Cost effective link analysis – flexible deployment on commodity resources (hardware and OS).
• Efficient, scalable, risk averse technology – enterprise proven. • High Speed parallel ingest to load graph data quickly. • Parallel, distributed queries • Flexible plugin architecture • Complementary technology • Fast proof of concept – easy to use Graph API.
InfiniteGraph - The Enterprise Graph Database
Copyright © Objectivity, Inc. 2013
InfiniteGraph Capabilities
Parallel Graph Traversal Inclusive or Exclusive Selection
X
X
Shortest or All Paths Between Objects
Start Start
Start Finish Start
Compute Cost To Date
Visualize
Computational & Visualization Plug-Ins
Copyright © Objectivity, Inc. 2013
A Powerful InfiniteGraph Query
San Francisco
Palo Alto
Hillsboro
Oakland
Pacifica
Palo Alto Cupertino
San Jose
Half Moon Bay
Problem: Find the cheapest route for moving a 200 ton load from San Francisco to San Jose
// Policies: Depth_First, Exclude Railway_Edge, Exclude_Road_Edge // Calculate: Cost_To_This_City() // Navigate: From “San Francisco” To “San Jose” // Visualizer: Map_Cheapest_Route // Visualizer: List_Cost_Breakdown.
Water Rail Road
Problem: Find the cheapest route for moving a 200 ton load from San Francisco to San Jose
// Note: This is pseudocode, not the actual Java statements.
Copyright © Objectivity, Inc. 2013 Copyright © Objectivity, Inc. 2012
Recognizing Graphs In Object Models...
Tree Structures
Graph (Network) Structures
Relationship Data
Object Class A
Object Class A
1-to-Many Relationship Data
Object Class A
Many-to-Many
Object Class A
Copyright © Objectivity, Inc. 2013 Copyright © Objectivity, Inc. 2012
...Recognizing Graphs In Object Models
Tree Structures
Graph (Network) Structures
Relationship Data
Object Class A
Object Class A
1-to-Many Relationship Data
Object Class A
Many-to-Many
Object Class A
EDGE
VERTEX
GRAPH MODEL
Copyright © Objectivity, Inc. 2013
The ETL Process
ETL Tools/Applications
Commodity Linux Platforms and/or High Performance Computing Clusters
Structured Semi-Structured
Object DB
Graph DB
Unstructured
Doc DB K-V Store Hadoop Column Store Data W/H RDBMS
Nodes & Edges
Copyright © Objectivity, Inc. 2013
Commonly Used Graph Algorithms...
Connectedness Node degree Shortest Path Average path length Transitive Closure Graph diameter (or Span) Centrality (Betweeness, Degree and Closeness) In the graph below, node D has the highest betweeness centrality
Copyright © Objectivity, Inc. 2013
Data Visualization & Analytics
Big Data Connection
Platform
*Now HP *Now IBM
Conventional & Relationship Analytics
ORACLE Big Data Solutions
+
A Typical Deployment Supplements Traditional or Big Data Systems With Graph Analytics
Copyright © Objectivity, Inc. 2013
Online Demo - Call Detail Record Analysis Used in law enforcement, counter-terrorism and Customer Resource Management
Copyright © Objectivity, Inc. 2013
Thank You!
Please take a look at objectivity.com For InfiniteGraph Online Demos, White Papers, Free
Downloads, Samples & Tutorials
and visit our booth for a demonstration