overview of no sql

20
Overview of NoSQL ...motivation, technologies, should you care?

Upload: sean-murphy

Post on 01-Nov-2014

2.353 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Overview of no sql

Overview of NoSQL...motivation, technologies, should you care?

Page 2: Overview of no sql

Overview● Evolution of/motivation for NoSQL

databases● Characterization of NoSQL databases● Classification of NoSQL databases● Popularity/usage of NoSQL systems

Page 3: Overview of no sql

A brief history of NoSQL● Originally coined in 1998 by Strozzi for

specific non-rel database○ easy to use, free, text based data storage, easy

manipulation of contents of db● Reintroduced by Evans (Rackspace) in 2009

for conf on open source distributed databases○ in response to increase in interest in non RDBMS

solutions■ bringing together Cassandra, Mongo, Couch, etc

● Has grown as a movement over last 3 years

Page 4: Overview of no sql

Current status● Significant buzz within community in 2010

○ initial development of technology○ pioneer deployments○ lots of meetups/conferences/birds of feathers

● Many key technologies evolved later 2010, 2011○ more large deployments for some technologies○ small companies with no legacy basing operations

on NoSQL

Page 5: Overview of no sql

Current Status● 2012

○ buzz/hype is fading○ technology continues to mature○ increased number of deployments○ skills sought in job market

Page 6: Overview of no sql

NoSQL - a negative definition● NoSQL simply defined by being non-

relational○ diverse set of technologies fall into NoSQL camp

● Motivations mixed○ open source○ scale - TB, PB - particulary for read/write latency○ increased flexibility over RDBMS systems○ ability to work with raw data○ ACID not always most appropriate design choice

■ analytics data is excellent example● Results in many different NoSQL

technologies

Page 7: Overview of no sql

Typical characteristics● Don't use SQL!● Open Source● Intended to deliver performance

○ in some dimension● Typically JOIN not supported

○ performance hit● Consistency often relaxed

○ eventual consistency● More flexibility in schema

○ if schema used at all!

Page 8: Overview of no sql

Diversity of NoSQL databases● 122 seperate technologies listed on http:

//nosql-database.org/○ mix of commercial, open source and some

inbetween● Vary in many dimensions:

○ architecture○ interfaces

■ api/languages○ internal data storage○ distribution mechanisms

■ redundancy, reliability○ usage - deployments & support community○ maturity

Page 9: Overview of no sql

Classification of NoSQL systems● Column based solutions● Document store solutions● Key/Value solutions● Graph based solutions● Less significantly:

○ XML databases○ Object databases○ Mulitvalue databases

Page 10: Overview of no sql

Column based solutions● Structured data

○ similar to classical tables● Generally much more flexible

○ no rigorous schema necessary○ can typically add columns in ad hoc fashion

■ often without explicitly declaring column● However, can result in very different usage

○ eg can have millions of columns associated with given row

● Examples: Hadoop/HBase, Cassandra, Hypertable, SimpleDB

Page 11: Overview of no sql

Document based solutions● Less structured data

○ DB composed of 'documents' containing arbitrary data■ usually containing longer form content eg CMS

● Documents contain some structure to support query/search/filter, etc

● Somewhat less emphasis on a key○ can be autogenerated

● Quite unlike classical databases● Examples: MongoDB, CouchDB

Page 12: Overview of no sql

Key/value stores● DBs inspired by memcache

○ simple, fast key/value stores● Attempt to retain most of DB in memory

○ fast response times● Different designs for scalability

○ single node/multi node● Much emphasis on the keys in this type of

DB● Write usually overwrites entire previous entry● Examples: Redis, Couchbase/Membase,

DynamoDB, Riak

Page 13: Overview of no sql

Graph based solutions● Obviously different from previous categories

○ Focus specifically on graphs● Queries supported are graph-specific

○ eg get nodes related to specified node● Typically support for solving standard graph

problems○ eg shortest path, general graph traversal

● Can deliver very significant performance over non-graph specific solutions○ for graph problems!

● Examples: Neo4j

Page 14: Overview of no sql

It's a noisy space...● Very many candidate technologies● Relatively small amount of real world

solutions● Differences between classifications above is

one of emphasis...○ column based and document based arrive at semi-

structured sweet spot from opposite ends of spectrum

● ...although this results in different preferred use cases...○ document based solution better for document

problems, eg CMS

Page 15: Overview of no sql

Common techniques used● Hashing techniques used to map data to

nodes in cluster● Internode communication via Gossip● Common replication techniques● Thrift is used in a few cases● MapReduce often used to search over

distributed system

Page 16: Overview of no sql

Comparison (oldish)...

Page 17: Overview of no sql

Comparison (oldish)

Page 18: Overview of no sql

Comparison (oldish)

Page 19: Overview of no sql

Horses for courses...● SQL is perfectly good solution for many

problems○ tried and tested

● Some problems require alternative solution○ typically driven by scale and/or flexibility

● NoSQL offers (many) alternatives○ although relatively easy to identify realistic options

● Column based approaches good for mostly structured data with enhanced flexibility

● Document based approaches good for document oriented problems

● Key/Value mostly intended for rapid response on more modest data sets

Page 20: Overview of no sql

...so let's dive into one NoSQL database...● Cassandra...