nosql, neo4j for java developers , oracleweek-2012

Post on 27-Jan-2015

109 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Ne4j for Java developers. Presented at OracleWeek 2012. Created by Eugene Hanikblum @ AlphaCSP

TRANSCRIPT

Seminar:BigData, NoSQL graph database for Java developers*

Presenter: Evgeny Hanikblum

Data is getting bigger:“Every 2 days we create as much information as we did up to 2003”

– Eric Schmidt, Google

Big Data Technologies

NoSQL Overview

NoSQL->Not Only SQL

Key Value Stores

• Most Based on Dynamo: Amazon Highly Available Key-Value Store

• Data Model: – Global key-value mapping– Big scalable HashMap– Highly fault tolerant (typically)

• Projects:

Key Value Stores

• Pros:– Simple data model– Scalable

• Cons– Create your own “foreign keys”– Poor for complex data

Column Databases• Most Based on BigTable: Google’s Distributed

Storage System for Structured Data• Data Model: – A big table, with column families– Map Reduce for querying/processing

• Projects:

Column Databases• Pros:– Supports Simi-Structured Data– Naturally Indexed (columns)– Scalable

• Cons– Poor for interconnected data

Document Databases

• Data Model: – A collection of documents– A document is a key value collection– Index-centric, lots of map-reduce

• Projects :

Document Databases

• Pros:– Simple, powerful data model– Scalable

• Cons– Poor for interconnected data– Query model limited to keys and indexes– Map reduce for larger queries

Graph Databases• Data Model: – Nodes and Relationships

• Projects:

Graph Databases• Pros:– Powerful data model, as general as RDBMS– Connected data locally indexed– Easy to query

• Cons– Sharding ( lots of people working on this)• Scales UP reasonably well

– Requires rewiring your brain

Why you need GraphDB?

GraphDB Overview

Because of Data expanded into relationships

GraphDB Overview

Because of Data became interconnected

When should I use it?

Use graph db, if you should deal with something like this :

or this …

or this …

Data is more connected:• Text (content)• HyperText (added pointers)• RSS (joined those pointers)• Blogs (added pingbacks)• Tagging (grouped related data)• RDF (described connected data)• GGG (content + pointers + relationships +

descriptions)

GraphDB Overview

Data is less structured:

• If you tried to collect all the data of every movie ever made, how would you model it?

• Actors, Characters, Locations, Dates, Costs, Ratings, Showings, Ticket Sales, etc.

GraphDB Overview

What is Graph

What is Graph

• An abstract representation of a set of objects where some pairs are connected by links.

Object (Vertex, Node)

Link (Edge, Arc, Relationship)

Different Kinds of Graphs• Undirected Graph• Directed Graph

• Pseudo Graph• Multi Graph

• Hyper Graph

More Kinds of Graphs

• Weighted Graph

• Labeled Graph

• Property Graph

What is Graph DB

What is a Graph DB?

• A database with an explicit graph structure• Each node knows its adjacent nodes • As the number of nodes increases, the cost

of a local step (or hop) remains the same• Plus an Index for lookups

Compared to Relational DatabasesOptimized for aggregation Optimized for connections

What is Neo4j?

What is Neo4j?

• A java based graph database• Property Graph• Full ACID (atomicity, consistency, isolation, durability)• High Availability (with Enterprise Edition)• 32 Billion Nodes, 32 Billion Relationships,

64 Billion Properties• Embedded Server• REST API

• Both nodes and relationships can have metadata. • Integrated pattern-matching-based query language (“Cypher”). • Also the “Gremlin” graph traversal language can be used. • Indexing of nodes and relationships. (Lucene) • Nice self-contained web admin. • Advanced path-finding with multiple algorithms. • Optimized for reads. • Has transactions (in the Java API)• Scriptable in Groovy• Online backup, advanced monitoring and High Availability is

AGPL/commercial licensed

What is Neo4j?

Neo4j is good for :• Highly connected data (social networks)• Recommendations (e-commerce)• Path Finding (how do I know you?)

• A* (Least Cost path)• Data First Schema (bottom-up, but you still

need to design)

how do I know you?

how can I get there ?

If you’ve ever• Joined more than 7 tables together• Modeled a graph in a table• Written a recursive CTE• Tried to write some crazy stored procedure

with multiple recursive self and inner joins

You should use Neo4j

rewiring you brain

name

code

word_count

Language

name

code

flag_uri

Country

IS_SPOKEN_IN

as_primary

language_code

language_name

word_count

Language

country_code

country_name

flag_uri

Country

language_code

country_code

primary

LanguageCountry

name: “Canada”

languages_spoken: “[ ‘English’, ‘French’ ]”

name: “Canada”

language:“English”

language:“Frech”

spoken_in

spoken_in

name: “USA”

name: “France”

spoken_in

spoken_in

rewiring you brain

name

flag_uri

language_name

number_of_words

yes_in_langauge

no_in_language

currency_code

Country

USES_CURRENCY

name

flag_uri

Country

name

number_of_words

yes

no

Language

SPEAKS

code

name

Currency

rewiring you brain

show me the code!

GraphDatabaseService graphDb = new EmbeddedGraphDatabase("var/neo4j");

Node david = graphDb.createNode();Node andreas = graphDb.createNode();

david.setProperty("name", "David Montag");andreas.setProperty("name", "Andreas Kollegger");

Relationship presentedWith = david.createRelationshipTo(andreas,

PresentationTypes.PRESENTED_WITH);

presentedWith.setProperty("date", System.currentTimeMillis());

Neo4j data browser

Neo4j data browser

Neoclipse

console.neo4j.org

Try it right now: start n=node(*) match n-[r:LOVES]->m return n, type(r), mNotice the two nodes in red, they are your result set.

Spring-Data-Neo4J

• Focus on Spring Data Neo4j• VMWare is collaborating with Neo Technology, the

company behind the Neo4j graph database.• Improved programming model: Annotation-based

programming model for applications with rich domain models

• Cross-store persistence: Extend existing JPA application with NoSQL persistence

• Tagging (grouped related data)• RDF (described connected data)

Spring-Data-Neo4J

@NodeEntity

Spring-Data-Neo4J

@NodeEntitypublic class Actor {

private String name;private int age;private HairColor hairColor;private transient String

nickname;

}

Spring-Data-Neo4J

@NodeEntity public class Movie {

@GraphId Long id;

@Indexed(type = FULLTEXT, indexName = "search") String title;

Person director;

@RelatedTo(type="ACTS_IN", direction = INCOMING) Set<Person> actors;

@RelatedToVia(type = "RATED") Iterable<Rating> ratings;

@Query("start movie=node({self}) match movie-->genre<--similar return similar") Iterable<Movie> similarMovies; }

@RelationshipEntity

Spring-Data-Neo4J

@RelationshipEntitypublic class Role {

@StartNodeprivate Actor actor;@EndNodeprivate Movie movie;privateString roleName;

}

Spring-Data-Neo4J

@RelationshipEntitypublic class Role {

@StartNode private Actor actor;@EndNode private Movie movie;

private String roleName;

}

@NodeEntitypublic class Actor {

@RelatedToVia(type = “ACTS_IN”)private Iterable<Role> roles;

}

How they did that ?

NoSql->Graph DB->Neo4JLecturer : Evgeny Hanikblum @ AlphaCSP:OracleWeek2012:Israel Email : evgenyh@alphacsp.com

top related