neo4j theory and practice - tareq abedrabbo @ graphconnect london 2013

59
Neo4j Theory and Practice Tareq Abedrabbo Graph Connect - 19/11/2013

Upload: neo4j-the-open-source-graph-database

Post on 27-Jan-2015

122 views

Category:

Technology


0 download

DESCRIPTION

In this talk Tareq will discuss graph solutions based on his experiences building a varied mix of graph-based systems. He will be sharing techniques and approaches that he has learned and will focus on a number of concepts that may be applied to a wider context.

TRANSCRIPT

Page 1: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Neo4j Theory and Practice

Tareq Abedrabbo Graph Connect - 19/11/2013

Page 2: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

About me

• CTO/Principal Consultant at OpenCredo

• Working with Neo4j for (almost) 3 years on a number of different projects

• Co-author of Neo4j in Action (Manning)

Page 3: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

What is this talk about?

Page 4: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

It’s for developers designing and building applications with Neo4j

Page 5: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

It’s not a collection of war stories but I will refer to

real-world examples

Page 6: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

It is about sharing thoughts and lessons learnt in a useful way

Page 7: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

“If I'm to believe Twitter, half of the earth's population are importing

Wikipedia into Neo4j, for very obscure reasons.”

Page 8: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Agenda

Page 9: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

• What is Neo4j?

• Approaching graph-based applications

• Design

• Implementation

• Test

• Use cases

• Lessons learnt

Page 10: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

What really is Neo4j?

Page 11: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

A graph model

Page 12: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

A query engine

Page 13: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

A database

Page 14: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Neo4j is a solid foundation on which to build graph-

based applications

Page 15: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

How should I approach graph-based applications?

Page 16: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Is there a useful way to categorise graph-based

applications?

Page 17: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Domain-centric applications

Page 18: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Data-centric applications

Page 19: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Domain-Centric• Well-defined data model

• Data changes through user interactions

• Flexible but predictable data structure(s)

• Recommendation engines, social networks, etc…

• Top-down design

Page 20: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Data-Centric• Complex connected data that typically models real

world networks

• Integrated from a variety of different sources

• Data can be unpredictable

• Telco networks, utility networks, etc…

• bottom-up design

Page 21: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Typically applications fall somewhere between

these 2 types

Page 22: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

How can I use the information available in

my graph?

Page 23: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

• Search and pattern-matching

• Find a recommendation based on behaviour

• Graph algorithms

• Shortest path, disconnected components

• Optimisation

• Maximise oil flow while minimising water

Page 24: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Graphs are naturally data-driven

Page 25: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Use case 1: Network Impact Analysis

Page 26: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013
Page 27: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Requirement: Identify the impact of failing

components

Page 28: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Requirement: Identify interesting patterns, such as single points of failure

Page 29: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Labelled property graph is a natural fit for the

model

Page 30: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Additional “dimensions” can be added to capture abstract concepts: network redundancy, load-balancing

Page 31: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Cypher queries are a natural solution to delivering

the different requirements

Page 32: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Use case 2: Oil flow optimisation

Page 33: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013
Page 34: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Requirement: Identify candidate configurations

to maximise flow

Page 35: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Requirement: Identify the most practical and valuable adjustments to the network

Page 36: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Simply connected graph with complex components

Page 37: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Interlude: Genetic Algorithms

Page 38: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

• Start from an initial population of candidate solutions (individuals or phenotypes), ideally random

• Attribute a score each solution using a fitness function

• The only place with specific business knowledge

• Apply genetic operators to create a new generation

• Cross-breeding to retain best characteristics from each parent

• Mutation to maintain diversity and to avoid converging to a local optima too quickly

• Stop when you want!

Page 39: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013
Page 40: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Is this even a use case for Neo4j?

Page 41: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Persist and share calculated solutions

Page 42: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Inspect intermediary steps

Page 43: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Use Cypher queries to interrogate solutions

Page 44: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Lessons learnt

Page 45: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Understand your domain

Page 46: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

• Don’t follow “best practices” blindly

• For domain-centric applications you can use a mapping framework, such as Spring Data Neo4j

• For data-centric applications, you should stay as close as possible to the graph model

• In any case, don’t try to hide the graph!

Page 47: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Use Cypher

Page 48: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

!

• Expressive

• Readable

• Maintainable

• Performant

• Cypher + the web console is the quickest way to experiment and to prototype solutions

Page 49: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Manage complexity with domain knowledge

Page 50: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

• Graph algorithms are typically complex

• Knowledge of the domain can simplify queries and traversals

• Make Cypher queries as specific as possible

• Take “shortcuts” when you know the domain

Page 51: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Write robust and flexible code

Page 52: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

• Break down problems into a small queries. Return graph resources (or ids) to chain queries.

• Robustness principal: “Be conservative in what you do, be liberal in what you accept from others”

• Use assertions as preconditions

• Assertions document intent

• Fail fast if data doesn’t match

Page 53: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Start with a representative dataset

Page 54: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

• Create a small data sets to capture the initial use cases

• Write simple unit tests using these datasets to support design and implementation

• These tests tend to become less useful when requirements are better understood

• Throw them away!

Page 55: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Move to a realistic dataset as soon as

possible

Page 56: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

• A realistic data set

• Should capture the complexity of the real data

• Should be sufficiently large

• Ideally based on production data

• Write functional and integration tests against this dataset

Page 57: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Test non-functional aspects

Page 58: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

• Graph data is inherently flexible and evolving

• Queries need to be correct and sufficiently performant

• Existing queries’s performance can degrade as the underlying model changes

• Assertions on timeouts should be part of the test suite to detect loops and poor performance

• JUnit’s @Test(timeout=5)

• Spring’s @Timeout(value=5)

Page 59: Neo4j Theory and Practice - Tareq Abedrabbo @ GraphConnect London 2013

Links

• Twitter: @tareq_abedrabbo

• Blog: http://www.terminalstate.net

• OpenCredo: http://www.opencredo.com

Thank you!