a walk down nosql lane in the cloud

27
A Walk down NOSQL Lane in the Cloud New York City Cloud Computing Group February 2011 Alexander Sicular @siculars

Upload: siculars

Post on 17-May-2015

1.184 views

Category:

Technology


0 download

DESCRIPTION

Introduction to NOSQL and various NOSQL solutions.

TRANSCRIPT

Page 1: A walk down NOSQL Lane in the cloud

A Walk down NOSQL Lane in the Cloud

New York City Cloud Computing GroupFebruary 2011

Alexander Sicular@siculars

Page 2: A walk down NOSQL Lane in the cloud

Who is this blowhard?Columbia University pays my mortgage

For the better part of a decade in Medical Informatics

Am not shilling for any of these companies

Am not a computer scientist

Am a computer science enthusiast particularly in the area of Informatics

Page 3: A walk down NOSQL Lane in the cloud

When I put my data in the “cloud”, to me it just means that it’s

virtualized in someone else’s server room

Page 4: A walk down NOSQL Lane in the cloud

Many, many providers and only growing

Amazon, Rackspace, Joyent, CouchOne, Cloudant, Azure, GAE, Heroku, no.de

Outsourced management

Zero capex

Controlled costs

...the Silver Lining

Page 5: A walk down NOSQL Lane in the cloud

...With a Chance of Rain?

Vendor lock in

Unreliable performance

i/o

cpu, memory

Bare metal > software virtualization

Page 6: A walk down NOSQL Lane in the cloud

NoSQL or NOSQL?Not Only SQL

Non/post relational

Big tent policy

Umbrella term

Fragmented

http://www.flickr.com/photos/morgennebel/2933723145/

Page 7: A walk down NOSQL Lane in the cloud

Your Usage PatternsRead vs. Write

Mutable vs. Immutable

Product Considerations:

In place updates

Write Only Logs

Page 8: A walk down NOSQL Lane in the cloud

This vs. ThatRiak wiki comparisons pagehttp://wiki.basho.com/Riak-Comparisons.html

Popular one page comparison of a number of NOSQL players by Kristof Kovacs:http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

Page 10: A walk down NOSQL Lane in the cloud

Why NOSQLSupport for “Vary Large” data sets

Schemaless

Denormalized

Green field

New applications

http://www.flickr.com/photos/gailtang/1243984297/

Page 11: A walk down NOSQL Lane in the cloud

AcademiaGoogle:

Bigtable http://labs.google.com/papers/bigtable.html

GFS http://labs.google.com/papers/gfs.html

M/R http://labs.google.com/papers/mapreduce.html

Amazon:

Dynamo http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf

NOSQL Summer http://nosqlsummer.org/papers

Page 12: A walk down NOSQL Lane in the cloud

Under the Hood Terminology

Write Only Log http://en.wikipedia.org/wiki/Log-structured_file_system

Merkle Trees http://en.wikipedia.org/wiki/Hash_tree

B-trees http://en.wikipedia.org/wiki/B-tree

Vector clock http://en.wikipedia.org/wiki/Vector_clock

Bloom filters http://en.wikipedia.org/wiki/Bloom_filters

Big O Notation http://en.wikipedia.org/wiki/Big_o_notation

Consistent Hashing http://en.wikipedia.org/wiki/Consistent_hashing

Page 13: A walk down NOSQL Lane in the cloud

CAP Theoremhttp://en.wikipedia.org/wiki/CAP_theorem

Consistency

Availability

Partition Tolerance

Pick two?

http://guide.couchdb.org/draft/consistency.html

Page 14: A walk down NOSQL Lane in the cloud

CouchDBCouchOne, Cloudant

Erlang

Extreme replication scenarios

Works on phones

Updated indexing (b-tree)

HTTP interface

Offline usage

Sharded scaling

Page 15: A walk down NOSQL Lane in the cloud

CouchDB Internal Architecture

http://nosqlpedia.com/wiki/File:CouchDB-Arch.JPG

Page 16: A walk down NOSQL Lane in the cloud

MongoDB10Gen, MongoHQ, MongoLab

C++

huMONGOus

Sharded scaling, replicated master/slave

Located in NYC (go visit them)

Soft landing for those coming from mysql (relational databases)

Native javascript

Secondary indexes

Page 17: A walk down NOSQL Lane in the cloud

MongoDB Sharding Diagram

http://www.snailinaturtleneck.com/blog/2010/03/30/sharding-with-the-fishes/

Page 18: A walk down NOSQL Lane in the cloud

MySQL to Mongo Query similarity

http://nosqlpedia.com/wiki/File:MongoDB.JPG

Page 19: A walk down NOSQL Lane in the cloud

RiakBasho, Joyent

Erlang

Distributed

HTTP, protobuf

Native javascript, erlang

Multiple backends

Homogeneous

CAP tunable

Page 20: A walk down NOSQL Lane in the cloud

HadoopCloudera, Apache Foundation

Java

High latency

Batch oriented

HDFS is GFS based

Open source Google stack via the Google papers

Huge ecosystem

Yahoo, FB, Twitter, Fortune 500

Pig, Hive, Flume

Page 21: A walk down NOSQL Lane in the cloud

HBaseJava

Low latency store

sits on top of Hadoop

Modeled after Google Bigtable

Column oriented

Thrift, protobuf

Backend for new Facebook Messaging service

Page 22: A walk down NOSQL Lane in the cloud

CassandraApache

Java

Column oriented

Like Bigtable and Dynamo

Originated at Facebook

At Twitter, Distributed countinghttp://www.infoq.com/presentations/NoSQL-at-Twitter-by-Ryan-Kinghttp://www.slideshare.net/kevinweil/rainbird-realtime-analytics-at-twitter-strata-2011

Page 23: A walk down NOSQL Lane in the cloud

RedisOpenRedis

C

REmote DIctionary Server

Specific data structures

incredibly fast

memcached on steroids

replicated master/slave

Page 24: A walk down NOSQL Lane in the cloud

CommonalitiesOpen Source

Adherence to common or standard:

data formats

json, bson, utf8, binary

data trandport mechanisms

http, thrift, protobuf, simple wire protocols

Page 25: A walk down NOSQL Lane in the cloud

Ok. So Now What?Analyze your requirements

Mailing lists

IRC, twitter

Project pages, wiki

Github/Google Code/Bitbucket:

project page

specific language clients

Page 26: A walk down NOSQL Lane in the cloud

Variety PackHybrid architectures will become the norm

Twitter - mysql, cassandra, hadoop

Google - mysql, GAE (BT)

Facebook - mysql, cassandra, hbase, memcached

Yahoo - mysql, hadoop

LinkedIn - voldemorthttp://www.flickr.com/photos/uncleweed/82245324/

Page 27: A walk down NOSQL Lane in the cloud

Questions?

New York City Cloud Computing GroupFebruary 2011

Alexander Sicular@siculars