hadoop & no sql new generation database systems

71
This document is intended for only AVEA İletişim Hizmetleri A.Ş.("AVEA"), its dealers, employees and/or others specifically authorised. The contents of this document are confidential and any disclosure, copying, distribution and/or taking any action in reliance with the content of this document is prohibited. AVEA is not liable for the transmission of this document in any manner to any third parties that are not authorised to receive. Hadoop & NoSQL New Generation Database Systems Ramazan FIRIN 22.04.2014

Upload: ramazan-firin

Post on 27-Jan-2015

114 views

Category:

Technology


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Hadoop & no sql   new generation database systems

This document is intended for only AVEA İletişim Hizmetleri A.Ş.("AVEA"), its dealers, employees and/or others specifically authorised. The contents of this document are

confidential and any disclosure, copying, distribution and/or taking any action in reliance with the content of this document is prohibited. AVEA is not liable for the transmission

of this document in any manner to any third parties that are not authorised to receive.

Hadoop & NoSQL

New Generation Database Systems

Ramazan FIRIN

22.04.2014

Page 2: Hadoop & no sql   new generation database systems

2

AGENDA

• Big Data

• Hadoop

• NoSQL

• Graph DB and Neoj

• Possible Usage in Tellco

• Demo

Page 3: Hadoop & no sql   new generation database systems

3

Executive Summary

AVEA

• Big Data is a new IT trend

• Hadoop and NoSQL can used to process Big Data

• Possible usage area in Tellco :- Prevent Churn

- to offer customer spesific campaign

- to get more customer

Page 4: Hadoop & no sql   new generation database systems

4

Big Bang = Big Data

Big Bang Big Data

42008-07-01_Presentation Template MBT / CEOMercedes-Benz Türk A.Ş.

Page 5: Hadoop & no sql   new generation database systems

5

What is Big Data?

Datasets that are too awkward to work with using traditional,

hands-ondatabase management tools.

Page 6: Hadoop & no sql   new generation database systems

6

Big Data- 3V Concept

Page 7: Hadoop & no sql   new generation database systems

7

Big Data To Smart Data

Cover of The Economist

Page 8: Hadoop & no sql   new generation database systems

8

Big Data Sources

1. Social network profiles -Facebook, LinkedIn, Yahoo, Google

2. Social influencers - blog comments, user forums, review sites,

3. Activity-generated data - application logs, sensor data

4. Public—Wikipedia, IMDb, etc

5. Data warehouse appliances - transactional data

6. Network and in-stream monitoring

7. Legacy documents—

Page 9: Hadoop & no sql   new generation database systems

9

Big Data Approach

Page 10: Hadoop & no sql   new generation database systems

10

Sample Usage - 360°Degree View of the Customers

Page 11: Hadoop & no sql   new generation database systems

11

Big Data Solutions – Oracle Big Data Appliance

Page 12: Hadoop & no sql   new generation database systems

12

Big Data Solutions – IBM Pure Data

Page 13: Hadoop & no sql   new generation database systems

13

Storage for Big Data

13

İf we cant use relational Database, how can westore it?

1)Hadoop2)NoSQL

Page 14: Hadoop & no sql   new generation database systems

14

What is HADOOP?

The Apache Hadoop software library is a framework that

allows for the distributed processing of large data sets

across clusters of computers using simple programming models

Page 15: Hadoop & no sql   new generation database systems

15

History

Page 16: Hadoop & no sql   new generation database systems

16

Hadoop Components

Page 17: Hadoop & no sql   new generation database systems

17

HADOOP ARCHITECTURE

Page 18: Hadoop & no sql   new generation database systems

18

Hadoop Ecosystem

Pig - simplifies hadoop programming, data processing language

Hive - SQL like queries

HBase - Random read/write, billions of row and millions of colums

(NoSQL)

Page 19: Hadoop & no sql   new generation database systems

19

NoSQL

Page 20: Hadoop & no sql   new generation database systems

20

RDBMS PERFORMANCE

20

Page 21: Hadoop & no sql   new generation database systems

21

Join is killer...

21

Page 22: Hadoop & no sql   new generation database systems

22

What is NoSQL?

• Stands for Not Only SQL

• Non relational

• Cheap, Easy to implement

• Scalability

– Vertically - Add more data

– Horizontally - Add more storage

• No pre-defined schema

• No join operations

• Not ACID, support CAP threom

Page 23: Hadoop & no sql   new generation database systems

23

Key-Value Stores

- Redis, Voldemort

Page 24: Hadoop & no sql   new generation database systems

24

Redis Features

• Data Types

• Publish / Subscribe

• Transactions

• Replication

• Persistence

• Partition

24

Page 25: Hadoop & no sql   new generation database systems

25

Redis Datatypes

• String

• List

• Sets

• Sorted Sets

• Hashes

25

Page 26: Hadoop & no sql   new generation database systems

26

Redis persistance

• RDB - Take snapshot in an interval

Fast

may loss several minutes data if kill -9

• AOF – Log for all operations

Still fast enough

may loss 1 second data if kill -9

26

Page 27: Hadoop & no sql   new generation database systems

27

Redis Commands

$ redis-cli set counter 100 OK

$ redis-cli incr counter (integer) 101

$ redis-cli incr counter (integer) 102

$ redis-cli incrby counter 10 (integer) 112

SET : SADD,

GET : SPOP, SRANDMEMBER, SMEMBERS

DEL : SREM

ETC : SINTER, SUNION, SCARD, SDIFF, SMOVE, SISMEMBER

27

Page 28: Hadoop & no sql   new generation database systems

28

Redis Commands – Lists

$redis-cli rpush messages "Hello how are you?" OK

$ redis-cli rpush messages "Fine thanks. I'm having fun with Redis"

OK

$ redis-cli rpush messages "I should look into this NOSQL thing

ASAP" OK

$ redis-cli lrange messages 0 2

1. Hello how are you?

2. 2. Fine thanks. I'm having fun with Redis

3. 3. I should look into this NOSQL thing ASAP

• Chat systems

• Paginations...28

Page 29: Hadoop & no sql   new generation database systems

29

Redis – Publish/Subscribe

redis 127.0.0.1:6379> PUBLISH myradioshow "Good morning

everyone!" (integer) 0

redis 127.0.0.1:6379> PUBLISH myradioshow "How ya'll doin

tonight?" (integer) 0

redis 127.0.0.1:6379> PUBLISH myradioshow "Hello? Is anyone

listening? I'm not wearing pants."

(integer) 0

redis 127.0.0.1:6379> SUBSCRIBE myradioshow

Reading messages... (press Ctrl-C to quit)

1) "subscribe"

2) 2) "myradioshow"

3) 3) (integer) 1 29

Page 30: Hadoop & no sql   new generation database systems

30

Document Database

- CouchDB, MongoDB

Page 31: Hadoop & no sql   new generation database systems

31

MongoDB Features

• JSON / BSON support

• RestFul support

• CRUD operations

• Queries like SQL

• İndexing

• Auto sharding

• Built in replication and high availabity

• Aggregation framework

31

Page 32: Hadoop & no sql   new generation database systems

32

Terminology

32

Page 33: Hadoop & no sql   new generation database systems

33

Sharding

33

Page 34: Hadoop & no sql   new generation database systems

34

MondoDB vs SQL

34

SQL MongoDB

SELECT * FROM users db.users.find()

SELECT id, user_id, status FROM users db.users.find( { }, { user_id: 1, status:

1 } )

SELECT * FROM users WHERE status

= "A"db.users.find( { status: "A" } )

SELECT user_id, status FROM users

WHERE status = "A"

db.users.find( { status: "A" }, {

user_id: 1, status: 1, _id: 0 } )

SELECT * FROM users WHERE

user_id like "%bc%"db.users.find( { user_id: /bc/ } )

SELECT * FROM users WHERE status

= "A" ORDER BY user_id ASC

db.users.find( { status: "A" } ).sort( {

user_id: 1 } )

SELECT * FROM users LIMIT 5 SKIP

10db.users.find().limit(5).skip(10)

Page 35: Hadoop & no sql   new generation database systems

35

Column Family Stores

-Cassandra, HBase

Page 36: Hadoop & no sql   new generation database systems

36

Cassandra Features

• Proven

• Rich Data Model

• Scalable

• Distributed & Decentralized

• High Performance read/write

• Fault Tolerance

• No SPOF

• Schema free

36

Page 37: Hadoop & no sql   new generation database systems

37

Cassandra Cluster

37

Page 38: Hadoop & no sql   new generation database systems

38

Benhmark

38

Page 39: Hadoop & no sql   new generation database systems

39

Architecture

39

Page 40: Hadoop & no sql   new generation database systems

40

Consistency Level

• ANY

• ONE

• TWO

• THREE

• QUORUM

• LOCAL_QUORUM

• EACH_QUORUM

• ALL

40

Page 41: Hadoop & no sql   new generation database systems

41

RMDBS Support ACID

• Atomicity - a transaction is all or nothing

• Consistency - only valid data is written to the database

• Isolation - pretend all transactions are happening serially and the data

is correct

• Durability - what you write is what you get

Page 42: Hadoop & no sql   new generation database systems

42

NoSQL Support CAP Threom

Consistency : all nodes give the same

answer

Avaibility : nodes always give answer and

accept updates

Partitioning: system continuos working if

some nodes go quite

Page 43: Hadoop & no sql   new generation database systems

43

Visual Guide to NoSQL Systems

43

Page 44: Hadoop & no sql   new generation database systems

44

Graph Database

- Neo4J, InfoGrid, Infinite Graph

Page 45: Hadoop & no sql   new generation database systems

45

Graph DB

Graph database uses graph structures with nodes, edges, and properties

to represent and store data.

Page 46: Hadoop & no sql   new generation database systems

46

NoSQL Performance

Page 47: Hadoop & no sql   new generation database systems

47

Graph DB Usage Area

• Recommendations

• Business Inteligence

• Social networking

• MDM

• System Management

• Time Series data

• Product Catalogue

• Web Analitics

• Scientific Computing

• Indexing your slow

RMDBS

Page 48: Hadoop & no sql   new generation database systems

48

Neo4j

Page 49: Hadoop & no sql   new generation database systems

49

Neo4j

• Leading Graph Database

• Transaction support (ACID)

• Indexing

• Querying

• REST support

• Disk Based

• Opensource

• Traversal framework

• High Performance (traverse 1.000.000 + relationship/seconds)

• Robust (in 7/24 operation since 2003)

• Massive scalability

Page 50: Hadoop & no sql   new generation database systems

50

Neo4j Data Model

Neo4j has Nodes and Relationship.

Nodes and realtionships have properties.

Node1 Node2

Property:name

Property:surname

Property:name

Property:surname

Relationship

Relationship type : knows

Property : Date of meeting

Page 51: Hadoop & no sql   new generation database systems

51

Relational Databases are Graphs!

Page 52: Hadoop & no sql   new generation database systems

52

Cypher For Query

Page 53: Hadoop & no sql   new generation database systems

53

Ne4j Performance

http://www.neotechnology.com/2012/10/20-billion-relationships-imported-

into-neo4j-on-ec2/

Page 54: Hadoop & no sql   new generation database systems

54

Who use Neo4j?

• Cisco - Master Data Management

• Telenor Group : Customer organization scructure (203 million

subscribers )

• Deutsche Telekom: Social football site (150 million subscribers )

Page 55: Hadoop & no sql   new generation database systems

55

Orient DB

• The Document-Graph

database

• ACID support

• SQL and Native Queries,

• schema-less, schema-full

and schema-mixed modes

• Roles + Security

• Functions

• HTTP / Restfull / Json /

Binary supports

• Hooks

• Fetch plans

• Inheritance

• 200.000 insert per

second(6 M node travels

with cache)

Page 56: Hadoop & no sql   new generation database systems

56

FluxGraph

• Temporal Graph Database

• Has checkpoint

• Compatible with Neo4j

562008-07-01_Presentation Template MBT / CEOMercedes-Benz Türk A.Ş.

Page 57: Hadoop & no sql   new generation database systems

57

Graphs of Telecommunications

57

Page 58: Hadoop & no sql   new generation database systems

58

CDR Analysis by Graph

58

Page 59: Hadoop & no sql   new generation database systems

59

Spring Data

59

Page 60: Hadoop & no sql   new generation database systems

60

Spring Data Neo4j

Page 61: Hadoop & no sql   new generation database systems

61

NoSQL Usage

• Cisco is building a master data management system based on Neo4j, and this is

actually our first Fortune 500 customer. They found us about two years ago when they

tried to build this big, complex hierarchy inside of Oracle RAC. In Oracle RAC, they had

response time in minutes, and then when they replaced it [with] Neo4j, they had

response times in milliseconds.

Emil Eifrem – Neo4j

CEO

• NHS tears out its Oracle Spine in favour of open source

http://www.theregister.co.uk/2013/10/10/nhs_drops_oracle_for_riak/

• AMD: Why we had to evacuate 276TB from Oracle DB to Hadoop

http://www.theregister.co.uk/2014/03/24/amd_hadoop_migration/

61

Page 62: Hadoop & no sql   new generation database systems

62 62

Statistics

Page 63: Hadoop & no sql   new generation database systems

63

Magic Quadrant for Operational Database Management Systems

63

Page 64: Hadoop & no sql   new generation database systems

64

NoSQL Market Size

64

Page 65: Hadoop & no sql   new generation database systems

65

NoSQL Engine Ranking

65

Page 66: Hadoop & no sql   new generation database systems

66

NoSQL in Enterprise App

66

Page 67: Hadoop & no sql   new generation database systems

67

Use of NoSQL products

67

Page 68: Hadoop & no sql   new generation database systems

68

Database market share

68

Page 69: Hadoop & no sql   new generation database systems

69

Web Application Arcitecture

69

Page 70: Hadoop & no sql   new generation database systems

70

Polyglot Persistance

70

Page 71: Hadoop & no sql   new generation database systems

71

Thanks