using a distributed graph database to make sense of disparate data stores

30
Copyright © Objectivity, Inc. 2013 Using A Distributed Graph Database To Make Sense Of Disparate Data Stores Leon Guzenda Dataweek San Francisco – October 2, 2013 Current Big Data Analytics Graph Analytics InfiniteGraph The ETL & Discovery Process

Upload: infinitegraph

Post on 14-Jan-2015

596 views

Category:

Technology


0 download

DESCRIPTION

Presented at DataWeek SF Oct 13 Most analytics depend on data-mining and statistical correlation of information held in single data stores. It is generally inefficient to replicate diverse data, which may be stored in enterprise databases or NoSQL "Big Data" repositories and consolidate them using a single database technology. Although federated queries can help with statistical correlation of data values across data stores the technique is not very good at handling the data stored in relationships because the data stores generally have no knowledge of one another. The speaker describes a different approach that uses graph (relationship) analytics to extract structural data from existing repositories, store representations of the nodes and connections in a graph database, then analyze them to extract additional value.

TRANSCRIPT

Page 1: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013

Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Leon Guzenda Dataweek

San Francisco – October 2, 2013

Current Big Data Analytics

Graph Analytics

InfiniteGraph

The ETL & Discovery Process

Page 2: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013

Objectivity Inc.

• Objectivity, Inc. is headquartered in Sunnyvale, CA.

• Objectivity has over two decades of Big Data and NoSQL experience

• We develop NoSQL platforms for managing and discovering relationships and patterns in complex data:

–Objectivity/DB - an object database that manages localized, centralized or distributed databases

–InfiniteGraph - a massively scalable graph database built on Objectivity/DB that enables organizations to find, store and exploit the relationships in their data Millions of deployments - Our technology is embedded in hundreds of enterprise and government systems and commercial products

Page 3: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013

A Typical Objectivity Deployment - Sensor Data Fusion

Network Centric Collaborative Targeting

Page 4: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013

A Typical InfiniteGraph Deployment - GraphMyLife

Page 5: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013

A Typical “Big Data” Analytics Setup

Data Aggregation and Analytics Applications

Commodity Linux Platforms and/or High Performance Computing Clusters

Structured Semi-Structured Unstructured

Graph DB

Object DB Doc DB K-V Store Hadoop Column

Store Data W/H RDBMS

Page 6: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013

Incremental Analytics Improvements Aren’t Enough

All current solutions use the same basic architectural model • None of the popular solutions have an efficient way to store connections

between entities in different silos • Most analytic technology focuses on the content of the data nodes, rather

than the many kinds of connections between the nodes and the data in those connections

• Why? Because traditional and earlier NoSQL solutions are bad at handling

relationships. • Graph databases can efficiently store, manage and query the many kinds of

relationships hidden in the data.

Presenter
Presentation Notes
Thinking we should be less about Objy in the last bullet… possibly Object oriented and graph databases… ?
Page 7: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013

Graph Analytics

Page 8: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013

Graph (Relationship) Analytics... A SQL Shortcoming Think about the SQL query for finding all links between the two “blue” rows... it's hard!!

Table_A Table_B Table_C Table_D Table_E Table_F Table_G

There are some kinds of complex relationship handling problems that SQL wasn't designed for.

Page 9: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013

...Graph Analytics

InfiniteGraph - The solution can be found with a few lines of code

A SQL Shortcoming

A3 G4

Table_A Table_B Table_C Table_D Table_E Table_F Table_G

Page 10: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013

Applications for Graph Analytics

LOGISTICS HEALTHCARE INFORMATICS

MARKET ANALYSIS SOCIAL NETWORK ANALYSIS

Page 11: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Representing the Graph...

Combatant A

Civilian Q

Situation Y

Civilian P

Bank X

Civilian S

Civilian R

Events/Places People/Orgs Facts

Situation X

The existing COMINT and HUMINT data might look like this:

Target T

Cafe C S Seen Near T A Banks at X

A Called P

A Seen At Y

A Seen Near X P Emailed S

P Called Q Q Seen Near T

P Called R R Seen Near T

X Paid S

A Eats At

Page 12: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Representing the Graph...

Combatant A

Civilian Q

Situation Y

Civilian P

Civilian S

Civilian R

Events/Places People/Orgs Facts

Situation X

Target T

We start by identifying the nodes (Vertices) and the connections (Edges)

NODES CONNECTIONS

S Seen Near T A Banks at X

A Called P

A Seen At Y

A Seen Near X P Emailed S

P Called Q Q Seen Near T

P Called R R Seen Near T

X Paid S Bank X

Cafe C

A Eats At

Page 13: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

VERTEX EDGE 2 N

...Representing the Graph..

“Nodes” “Connections”

Page 14: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

...Representing the Graph..

Situation X Combatant A Seen Near

Civilian P

Called

Called

Seen At Situation Y

Civilian Q

Target T

Seen Near

Emailed

Banks At

Bank X

Civilian S

Seen Near

Called

Civilian R

Seen Near

Paid

Eats At

Cafe C

VERTEX EDGE “Nodes” “Connections”

Page 15: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

...Analyzing the Graph...

Situation X Combatant A Seen Near

Civilian P

Called

Called

Seen At Situation Y

Civilian Q

Target T

Seen Near

Emailed

Banks At

Bank X

Civilian S

Seen Near

Called

Civilian R

Seen Near

Paid

Eats At

Cafe C

Page 16: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

...Threat Analysis

Situation X Combatant A Seen Near

Civilian P

Called

Called

Seen At Situation Y

Civilian Q

Target T

Seen Near

Emailed

Banks At

Bank X

Civilian S

Seen Near

Called

Civilian R

Seen Near

Paid

SUSPECTS

NEEDS PROTECTION

Page 17: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013

Visual Analytics

Presenter
Presentation Notes
Note Object Oriented Databases as NOSQL here.
Page 18: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013

Graphs Can Scale Very Quickly

We often hear about the “trillion row” database. Amazon S3 has reached 2 trillion, but one Objectivity site:

• Processes 10s of trillions of objects per day

• Supports over 1000 analysts around the clock.

Consider a graph where each node has 10 connections:

• At 6 degrees of freedom, finding a path between two nodes may require traversing a million links.

• 9 degrees of freedom requires a billion traversals

• 12 degrees of freedom requires a trillion traversals

• 15 degrees of freedom requires a quadrillion traversals...

Page 19: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013

THE ETL & DISCOVERY PROCESS

Presenter
Presentation Notes
This section seems out of place.
Page 20: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013

Not Only SQL – A group of 4 primary technologies

Simple Highly Interconnected

Page 21: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013

• A high performance distributed database engine that supports analyst-time decision support and actionable intelligence

• Cost effective link analysis – flexible deployment on commodity resources (hardware and OS).

• Efficient, scalable, risk averse technology – enterprise proven. • High Speed parallel ingest to load graph data quickly. • Parallel, distributed queries • Flexible plugin architecture • Complementary technology • Fast proof of concept – easy to use Graph API.

InfiniteGraph - The Enterprise Graph Database

Page 22: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013

InfiniteGraph Capabilities

Parallel Graph Traversal Inclusive or Exclusive Selection

X

X

Shortest or All Paths Between Objects

Start Start

Start Finish Start

Compute Cost To Date

Visualize

Computational & Visualization Plug-Ins

Page 23: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013

A Powerful InfiniteGraph Query

San Francisco

Palo Alto

Hillsboro

Oakland

Pacifica

Palo Alto Cupertino

San Jose

Half Moon Bay

Problem: Find the cheapest route for moving a 200 ton load from San Francisco to San Jose

// Policies: Depth_First, Exclude Railway_Edge, Exclude_Road_Edge // Calculate: Cost_To_This_City() // Navigate: From “San Francisco” To “San Jose” // Visualizer: Map_Cheapest_Route // Visualizer: List_Cost_Breakdown.

Water Rail Road

Problem: Find the cheapest route for moving a 200 ton load from San Francisco to San Jose

// Note: This is pseudocode, not the actual Java statements.

Page 24: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013 Copyright © Objectivity, Inc. 2012

Recognizing Graphs In Object Models...

Tree Structures

Graph (Network) Structures

Relationship Data

Object Class A

Object Class A

1-to-Many Relationship Data

Object Class A

Many-to-Many

Object Class A

Page 25: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013 Copyright © Objectivity, Inc. 2012

...Recognizing Graphs In Object Models

Tree Structures

Graph (Network) Structures

Relationship Data

Object Class A

Object Class A

1-to-Many Relationship Data

Object Class A

Many-to-Many

Object Class A

EDGE

VERTEX

GRAPH MODEL

Page 26: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013

The ETL Process

ETL Tools/Applications

Commodity Linux Platforms and/or High Performance Computing Clusters

Structured Semi-Structured

Object DB

Graph DB

Unstructured

Doc DB K-V Store Hadoop Column Store Data W/H RDBMS

Nodes & Edges

Page 27: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013

Commonly Used Graph Algorithms...

Connectedness Node degree Shortest Path Average path length Transitive Closure Graph diameter (or Span) Centrality (Betweeness, Degree and Closeness) In the graph below, node D has the highest betweeness centrality

Page 28: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013

Data Visualization & Analytics

Big Data Connection

Platform

*Now HP *Now IBM

Conventional & Relationship Analytics

ORACLE Big Data Solutions

+

A Typical Deployment Supplements Traditional or Big Data Systems With Graph Analytics

Presenter
Presentation Notes
By having a scalable and distributed platform that can manage connections between all types of disparate data, enterprise can easily capitalize on the best tools for the job at hand.
Page 29: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013

Online Demo - Call Detail Record Analysis Used in law enforcement, counter-terrorism and Customer Resource Management

Page 30: Using A Distributed Graph Database To Make Sense Of Disparate Data Stores

Copyright © Objectivity, Inc. 2013

Thank You!

Please take a look at objectivity.com For InfiniteGraph Online Demos, White Papers, Free

Downloads, Samples & Tutorials

and visit our booth for a demonstration