graph based data models

88
Soulemane Moumie @moumie.org Graph Based Data Models

Upload: moumie-soulemane

Post on 25-Jan-2017

34 views

Category:

Technology


0 download

TRANSCRIPT

Soulemane Moumie

@moumie.org

Graph Based Data Models

Outline

Introduction

Graph in real world

What is a Graph ?

Data Model

Graph in RDBMS

Graph-based modeling

Graph Databases

Graph Query Languages

Demo with Neo4J and OrientDB

Conclusions

Introduction

We live in a connected world. There are no isolated pieces of information around us but rich ,connected domains all around us.

Interconnectivity of data is an important aspect.

Early adopters of graph technology re-imagined their businesses around the value of data relationship.

These companies quickly grew up from unknown startup to large industrial corporations.

Google, LinkedIn, PayPal, Facebook, Twitter.

Graph in real world

Fraud detection: uncovering fraud ring

Ref: 1

Graph in real world

Realtime recommendation engine

Ref: 1

Graph in real world

Master data management solutions: employee hierarchy data

Ref: 1

Graph in real world

Empowering Network and IT solutions: Troubleshooting

Ref: 1

Graph in real world

Social network

Ref: 1

What is a graph ?

Graph Theory is Boring …

Ref: 2

What is a graph ? : History

Ref: 3

What is a graph ? : Definition

What is a graph ? : Definition

What is a graph ? : Definition

What is a graph ? : Type

What is a graph ? : Density

What is a graph ? : Density

What is a graph ? : Density

What is a graph ? : Graph storage

Graph in RDMBS ?

Ref: 6

Data model

Definition: A data model is an abstract model that organizes elements of data and standardizes how they relate to one another and to properties of the real world.

Ref: 7

Graph in RDMBS : Model

Ref: 4, 5

Graph in RDMBS : Model

Ref: 4, 5

Graph in RDMBS : Model

Ref: 4, 5

Graph in RDMBS : Model

Ref: 4, 5

Graph in RDMBS : Model

Ref: 4, 5

Graph in RDMBS : Model

Ref: 4, 5

Graph in RDMBS ?: Model

Ref: 4, 5

Graph in RDMBS ?: Model

Ref: 4, 5

Graph in RDMBS ?: Model

Ref: 4, 5

Graph in RDMBS ?: Model

Ref: 4, 5

Graph in JSON Database ?

Ref: 8

Graph in XML Database?

Ref: 8

Graph in RDMBS ?: Issues

While storing a graph in a relational

database is simple, querying it,

particularly traversing it,

can be time-inefficient due to the

number of potential joins with its

complex queries

Graph Database?

If any database can represent the

graph, then what is the graph

database ?

NoSQL : Characteristics

NoSQL : History

Ref: 9

NoSQL : Categories

Ref: 9

Graph Database?: Definition

“A graph database is any storage system that

provides index-free adjacency. ”

• Each vertex serves as a “mini index” of its adjacent elements

•No index lookups are necessary.

• The cost of the local step remains constant as the graph grows

• Cheaper than global indexes

Ref: 10

Graph Database?: Traversal

Ref: 11

Graph Database?: Traversal

Ref: 11

Graph Database?: Definition

“A database that uses graph structures for semantic

queries with nodes,

edges and properties to represent and store data”

Independent of the way the data is stored internally.

It‟s really the model and the implemented algorithms that matter.

Ref: 12

Graph data model : Representation

Graph data model : Representation

Graph data model : Representation

Graph data model : Representation

Graph data model : Representation

Graph data model : Building blocks

Nodes : entities

Relationships: connect entities and structure domain

Properties: attributes and meta data

Labels: group node by role

Graph data model : Building blocks

Graph data model : ERD example

Ref: 13

Graph data model : ERD example

Ref: 13

Graph data model : Why ?

For applications where „Interconnectivity and topology‟ matters.

It allows for a more natural modeling of connected data. Graph structures are visible to the user and they allow a natural way of handling applications data, for example, hypertext or geographic data

Queries can refer directly to graph structure.

So, we can do specific graph operations like – shortest path, sub graph determining etc.

For implementation, graph databases may provide special graph storage structures, and efficient graph algorithms for realizing specific operations

Ref: 14

Graph data model : Motivation and Application

Critic on classical DB models – drawbacks + difficulty for user to see data connectivity.

For applications – where complexity exceed capabilities of relational database. e.g. Managing transport n/w.

Limited expressive power of current query language.

the appearance of on-line hypertext evidenced the need for other db-models.

In technological networks, the spatial and geographical aspects of the structure are dominant.

Ref: 14

Database model : Components

Database model : Notions

Schema:

-Database schema is the skeleton of database.

- It is designed when the database doesn't exist at all.

- A database schema does not contain any data or information.

Instance

- It is a state of operational database with data at any given time.

- It contains a snapshot of the database.

- Database instances tend to change with time.

Graph database model : Definition

Ref: 14

Graph database model : representation

Representation of database:

flat graph: has many interconnected nodes, not expressive, extendible ,

difficult to present the information to the user in a clear way.

hypernode: set of nested graphs, expressive, it is a graph whose nodes

can themselves be a graph. Offers the ability to represent each real-world

object as a separate database entity.

Graph database model : Data structures

1. Genealogy diagram example

Ref: 14

Graph database model : Data structures

1. Logical Data model

The schema uses two basic type nodes for representing data values (N,L), and two product

type nodes (NL,PP) to establish relations among data values in a relational

style. The instance is a collection of tables, one for each node of the schema

Ref: 14

Graph database model : Data structures

2. Hypernode Data Model

The schema defines a person as a complex object with the properties name

and lastname of type string, and parent of type person (recursively defined). The instance

shows the relations in the genealogy among different instances of person

Ref: 14

Graph database model : Data structures

3. Hypergraph-Based Data Model (GROOVY)

GROOVY: Graphically Represented Object-Oriented data model with Values

The schema level models an object PERSON as a hypergraph that relates the attributes

NAME, LASTNAME and PARENTS.

Ref: 14

Graph database model : Data structures

4. Graph Data Model (GDM)

Ref: 14

Graph database model : Integrity constraints

Integrity constraints are general statements and rules that define the set of consistent

database states, or changes of state. In the case of graph db-models, it includes:

Schema instance integrity: Entity types and type checking

Schema instance separation: degree to which schema and instance are different objects

in the database

Redundancy of data: preserve uniqueness of data

Object identity and referential integrity: Entity Integrity assures that each hypernode is a

unique real world entity identified by its content; Referential Integrity requires that only

existing entities be referenced.Ref: 14

Graph database model : Comparison with other Database Models

Ref: 14

Graph Databases : Critics

Yes, graph model is more versatile than relational model, but it doesn't

make it universal - in some cases, this versatility is a roadblock for

optimizations.

In fact, modern graph databases are a niche solutions for a narrow set of

tasks - finding a route from A to B, working with friends in a social

network, information technology in medicine.

For most business applications relational databases continue to prevail.

Graph Databases : Critics

Relational databases were designed to aggregate

data, graph to find relations.

E.g: In the financial domain, all connections are known,

You only aggregate data by other data to find sums

and so on.

Graph Databases : Critics

Usually need to learn a new query language like

CIPHER, Gremlin, SparcQL

You have to use an API.

Fewer vendors to choose from, and smaller user

base, so harder to get support when you run into

issues

Graph Databases : Critics

Graph databases are relatively immature

compared to well-established RDBMS.

Requires conceptual shift

No standardization

Graph Databases : Trends

Ref: 15

Graph Databases : The most popular

Ref: 16

Graph Query Language:

Cypher

Extended SQL

Gremlin

Graph Query Language: CQL

Ref: 17

Graph Query Language: Neo4j CQL Commands/Clauses

Graph Query Language: Cypher

Ref: 18

Graph Query Language: Cypher

Ref: 18

Graph Query Language: Cypher

Ref: 18

Graph Query Language: Cypher

Ref: 18

Graph Query Language: Cypher

Ref: 18

Graph Query Language: Cypher

Ref: 18

Graph Query Language : Extended SQL &Gremlin

OrientDB is a 2nd Generation Distributed

Graph Database with the flexibility of

Documents in one product. OrientDB is

another great graph DB tool which also

operates as a document DB or an Object-

Oriented Database. Its query language is

based on SQL to make it 'more familiar to

TSQL developers'. Like Neo4J there is a

community edition available and licensing

for enterprise is very reasonable.

Ref: 19

Graph Query Language: Extended&SQL,&Gremlin

Ref: 20

Graph Query Language: OrientDB SQL: schema

Ref: 20

Graph Query Language: Populate orientDB

Graph Query Language: Queries

Graph Query Language: Gremlin

A lot of graph databases support their custom languages (e.g. Cipher in Neo4j).

These languages are really useful, however they become useless on other databases.

Gremlin is a powerful domain specific traversal language for graph databases.

This language is supported by all popular graph databases.

Learning Gremlin for graph databases is equivalent to learning SQL for relational

databases.Ref: 21

Graph Query Language: Gremlin

Ref: 22,23

References[1] https://neo4j.com/blog/rdbms-graphs-basics-for-relational-developer/

[2] http://images.google.de/imgres?imgurl=http%3A%2F%2Fcdn2.business2community.com%2Fwp-content%2Fuploads%2F2014%2F03%2Fistock_000006832296xsmall_small.jpg&imgrefurl=http%3A%2F%2Fwww.business2community.com%2Fcontent-marketing%2Fbrand-boring-content-marketing-0819786&h=232&w=300&tbnid=GUe7dYIZl9-29M%3A&docid=jRPxqmLQ1TLKWM&ei=2V1uV4-eG8yWgAad_baAAQ&tbm=isch&client=firefox-b&iact=rc&uact=3&dur=226&page=3&start=42&ndsp=27&ved=0ahUKEwjP7pz8_MLNAhVMC8AKHZ2-DRAQMwiLASgrMCs&bih=634&biw=1366

[3] http://www.slideshare.net/infinitegraph/an-introduction-to-graph-databases, slide 5

[4] Trees and Hierarchies in SQL for Smarties, Joe Celko, Morgan Kaufmann, ISBN: 1558609202

[5] http://www.slideshare.net/ehildebrandt/trees-and-hierarchies-in-sql

[6] http://www.slideshare.net/navicorevn/hierarchical-data-models-in-relational-databases

[7] https://en.wikipedia.org/wiki/Data_model

[8] http://www.slideshare.net/slidarko/graph-windycitydb2010/25-Representing_a_Graph_in_a

[9] GRAPH DATABASES AND ORIENTDB. INFO-H-415: Advanced Databases (Project). Professor: Esteban Zimányi,

cs.ulb.ac.be/public/_media/teaching/infoh415/student_projects/orientdb.pdf

References[10] http://systemg.research.ibm.com/database.html

[11] https://www.youtube.com/watch?v=kpLqfFGubKM

[12] https://www.arangodb.com/2016/04/index-free-adjacency-hybrid-indexes-graph-databases/

[13] https://neo4j.com/blog/rdbms-vs-graph-data-modeling/

[14] Angles, R., & Gutierrez, "Survey of graph database models", ACM Computing Surveys, Vol.40, No.1, Article 1, Feb.2008

[15] http://db-engines.com/en/ranking_trend/graph+dbms

[16] R.Campbell et al., "A performance evaluation of open source graph databases",ACM ,PPAA ‟14, February 16, 2014.

[17] http://www.tutorialspoint.com/neo4j/neo4j_cql_introduction.htm

[18] https://neo4j.com/developer/cypher-query-language/

[19] http://orientdb.com/docs/last/index.html

[20] http://pettergraff.blogspot.de/2014/01/getting-started-with-orientdb.html

[21] http://www.fromdev.com/2013/09/Gremlin-Example-Query-Snippets-Graph-DB.html

[22] http://sql2gremlin.com/

[23] http://gremlindocs.spmallette.documentup.com/

Danke !