nosql

48
NOSQL Eric Marshall April 7 th , 2016 For LOPSA NJ

Upload: ericwilliammarshall

Post on 08-Jan-2017

242 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Nosql

NOSQL Eric MarshallApril 7th 2016For LOPSA NJ

IrsquoM ERIC I work at Airisdata and we are hiring httpairisdatacom

WHATrsquoS WRONG WITH RELATIONAL DATABASES Nothing ) Google amp Amazon (followed by web tech) Higher Performance Larger Scale Lower Cost New Capabilities

BUT FIRST A POOR METAPHOR

Cars What leads to better performance

bull Bigger engine remove excess weightfeatures

bull Better controlssteeringbraking

WAIT I WANT MORE PERFORMANCE

We can go faster

BUT YOU CANrsquoT MOVE IKEA FURNITURE Feature loss Is it a car

WELL WE CAN SOLVE THAT PROBLEM Also it has a very powerful engine

THE CHALLENGE OF PERFORMANCE

ltadd wisdomgt long winded way to nosql is a poor label

SO WHAT IS NOSQL lsquoUM NON-RELATIONALrsquo No good definitions to be found For me Scales horizontally Foregoes the lsquoold schoolrsquo SQL relations concurrency etc

ldquoexactly like SQL (except where itrsquos not)rdquo Trades-in or reimagines most SQL features for lsquosomething elsersquo

Developer friendlydeveloper driven Schema loose semi-structured Usually Open Source and usually associated with web infrastructure Ignoring older non-relational databases of the past Scales Horizontally (usually) ndash did I mention that Can be lsquogluedrsquo to other data stores

Donrsquot like mine create your own definition )

SIDEBAR OBSERVATION ON SOFTWARE TEAMS

Software teams tied to large central relational database (think 1990s2000s) Large relational database lsquogluersquo teams and apps

together leads to complex databases and dbadmins

VsSoftware teams using no sql Independent except at the edges (inputlogs amp

outputreports)

FOWLERrsquoS IMPEDANCE MISMATCH

Java objects vs rows in tables

What I have called Fowlerrsquos Impedance is mentioned in his and Sadlagersquos book NoSQL Distilled

Most of nosql beasties can store data in more interesting ways

CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory

to act as if they were executing on a single node responding to operations one at a time

Most systems are not (exactly)

A is for Availability ldquoFor a distributed system to be continuously available every request

received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo

I think everyone here understands this one )

P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose

arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo

YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have

the correct answer

Availability The system will always answer you might get your checking balance

from last year instead of todayrsquos balance but you will get an answer

Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems

httpscodahalecomyou-cant-sacrifice-partition-tolerance

ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)

WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block

Handling ldquopartition casesrdquo ie part of the systemnetwork is down

DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent

See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray

HOW TO DISTRIBUTE THE DATA

Option 1 shard Option 2 replicate Option 3 do both

WHAT WOULD LINNAEUS SAY

Key-Value

httpsenwikipediaorgwikiLinnaean_taxonomy

Graph DB

Document

Columnar (aka BigTable)

Disclaimer heavy overlap

COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups

COLUMNAR EXAMPLES

httpdb-enginescomenranking_trendwide+column+store

KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication

KEY VALUE EXAMPLES

httpdb-enginescomenranking_trendkey-value+store

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 2: Nosql

IrsquoM ERIC I work at Airisdata and we are hiring httpairisdatacom

WHATrsquoS WRONG WITH RELATIONAL DATABASES Nothing ) Google amp Amazon (followed by web tech) Higher Performance Larger Scale Lower Cost New Capabilities

BUT FIRST A POOR METAPHOR

Cars What leads to better performance

bull Bigger engine remove excess weightfeatures

bull Better controlssteeringbraking

WAIT I WANT MORE PERFORMANCE

We can go faster

BUT YOU CANrsquoT MOVE IKEA FURNITURE Feature loss Is it a car

WELL WE CAN SOLVE THAT PROBLEM Also it has a very powerful engine

THE CHALLENGE OF PERFORMANCE

ltadd wisdomgt long winded way to nosql is a poor label

SO WHAT IS NOSQL lsquoUM NON-RELATIONALrsquo No good definitions to be found For me Scales horizontally Foregoes the lsquoold schoolrsquo SQL relations concurrency etc

ldquoexactly like SQL (except where itrsquos not)rdquo Trades-in or reimagines most SQL features for lsquosomething elsersquo

Developer friendlydeveloper driven Schema loose semi-structured Usually Open Source and usually associated with web infrastructure Ignoring older non-relational databases of the past Scales Horizontally (usually) ndash did I mention that Can be lsquogluedrsquo to other data stores

Donrsquot like mine create your own definition )

SIDEBAR OBSERVATION ON SOFTWARE TEAMS

Software teams tied to large central relational database (think 1990s2000s) Large relational database lsquogluersquo teams and apps

together leads to complex databases and dbadmins

VsSoftware teams using no sql Independent except at the edges (inputlogs amp

outputreports)

FOWLERrsquoS IMPEDANCE MISMATCH

Java objects vs rows in tables

What I have called Fowlerrsquos Impedance is mentioned in his and Sadlagersquos book NoSQL Distilled

Most of nosql beasties can store data in more interesting ways

CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory

to act as if they were executing on a single node responding to operations one at a time

Most systems are not (exactly)

A is for Availability ldquoFor a distributed system to be continuously available every request

received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo

I think everyone here understands this one )

P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose

arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo

YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have

the correct answer

Availability The system will always answer you might get your checking balance

from last year instead of todayrsquos balance but you will get an answer

Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems

httpscodahalecomyou-cant-sacrifice-partition-tolerance

ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)

WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block

Handling ldquopartition casesrdquo ie part of the systemnetwork is down

DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent

See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray

HOW TO DISTRIBUTE THE DATA

Option 1 shard Option 2 replicate Option 3 do both

WHAT WOULD LINNAEUS SAY

Key-Value

httpsenwikipediaorgwikiLinnaean_taxonomy

Graph DB

Document

Columnar (aka BigTable)

Disclaimer heavy overlap

COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups

COLUMNAR EXAMPLES

httpdb-enginescomenranking_trendwide+column+store

KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication

KEY VALUE EXAMPLES

httpdb-enginescomenranking_trendkey-value+store

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 3: Nosql

WHATrsquoS WRONG WITH RELATIONAL DATABASES Nothing ) Google amp Amazon (followed by web tech) Higher Performance Larger Scale Lower Cost New Capabilities

BUT FIRST A POOR METAPHOR

Cars What leads to better performance

bull Bigger engine remove excess weightfeatures

bull Better controlssteeringbraking

WAIT I WANT MORE PERFORMANCE

We can go faster

BUT YOU CANrsquoT MOVE IKEA FURNITURE Feature loss Is it a car

WELL WE CAN SOLVE THAT PROBLEM Also it has a very powerful engine

THE CHALLENGE OF PERFORMANCE

ltadd wisdomgt long winded way to nosql is a poor label

SO WHAT IS NOSQL lsquoUM NON-RELATIONALrsquo No good definitions to be found For me Scales horizontally Foregoes the lsquoold schoolrsquo SQL relations concurrency etc

ldquoexactly like SQL (except where itrsquos not)rdquo Trades-in or reimagines most SQL features for lsquosomething elsersquo

Developer friendlydeveloper driven Schema loose semi-structured Usually Open Source and usually associated with web infrastructure Ignoring older non-relational databases of the past Scales Horizontally (usually) ndash did I mention that Can be lsquogluedrsquo to other data stores

Donrsquot like mine create your own definition )

SIDEBAR OBSERVATION ON SOFTWARE TEAMS

Software teams tied to large central relational database (think 1990s2000s) Large relational database lsquogluersquo teams and apps

together leads to complex databases and dbadmins

VsSoftware teams using no sql Independent except at the edges (inputlogs amp

outputreports)

FOWLERrsquoS IMPEDANCE MISMATCH

Java objects vs rows in tables

What I have called Fowlerrsquos Impedance is mentioned in his and Sadlagersquos book NoSQL Distilled

Most of nosql beasties can store data in more interesting ways

CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory

to act as if they were executing on a single node responding to operations one at a time

Most systems are not (exactly)

A is for Availability ldquoFor a distributed system to be continuously available every request

received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo

I think everyone here understands this one )

P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose

arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo

YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have

the correct answer

Availability The system will always answer you might get your checking balance

from last year instead of todayrsquos balance but you will get an answer

Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems

httpscodahalecomyou-cant-sacrifice-partition-tolerance

ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)

WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block

Handling ldquopartition casesrdquo ie part of the systemnetwork is down

DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent

See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray

HOW TO DISTRIBUTE THE DATA

Option 1 shard Option 2 replicate Option 3 do both

WHAT WOULD LINNAEUS SAY

Key-Value

httpsenwikipediaorgwikiLinnaean_taxonomy

Graph DB

Document

Columnar (aka BigTable)

Disclaimer heavy overlap

COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups

COLUMNAR EXAMPLES

httpdb-enginescomenranking_trendwide+column+store

KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication

KEY VALUE EXAMPLES

httpdb-enginescomenranking_trendkey-value+store

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 4: Nosql

BUT FIRST A POOR METAPHOR

Cars What leads to better performance

bull Bigger engine remove excess weightfeatures

bull Better controlssteeringbraking

WAIT I WANT MORE PERFORMANCE

We can go faster

BUT YOU CANrsquoT MOVE IKEA FURNITURE Feature loss Is it a car

WELL WE CAN SOLVE THAT PROBLEM Also it has a very powerful engine

THE CHALLENGE OF PERFORMANCE

ltadd wisdomgt long winded way to nosql is a poor label

SO WHAT IS NOSQL lsquoUM NON-RELATIONALrsquo No good definitions to be found For me Scales horizontally Foregoes the lsquoold schoolrsquo SQL relations concurrency etc

ldquoexactly like SQL (except where itrsquos not)rdquo Trades-in or reimagines most SQL features for lsquosomething elsersquo

Developer friendlydeveloper driven Schema loose semi-structured Usually Open Source and usually associated with web infrastructure Ignoring older non-relational databases of the past Scales Horizontally (usually) ndash did I mention that Can be lsquogluedrsquo to other data stores

Donrsquot like mine create your own definition )

SIDEBAR OBSERVATION ON SOFTWARE TEAMS

Software teams tied to large central relational database (think 1990s2000s) Large relational database lsquogluersquo teams and apps

together leads to complex databases and dbadmins

VsSoftware teams using no sql Independent except at the edges (inputlogs amp

outputreports)

FOWLERrsquoS IMPEDANCE MISMATCH

Java objects vs rows in tables

What I have called Fowlerrsquos Impedance is mentioned in his and Sadlagersquos book NoSQL Distilled

Most of nosql beasties can store data in more interesting ways

CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory

to act as if they were executing on a single node responding to operations one at a time

Most systems are not (exactly)

A is for Availability ldquoFor a distributed system to be continuously available every request

received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo

I think everyone here understands this one )

P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose

arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo

YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have

the correct answer

Availability The system will always answer you might get your checking balance

from last year instead of todayrsquos balance but you will get an answer

Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems

httpscodahalecomyou-cant-sacrifice-partition-tolerance

ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)

WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block

Handling ldquopartition casesrdquo ie part of the systemnetwork is down

DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent

See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray

HOW TO DISTRIBUTE THE DATA

Option 1 shard Option 2 replicate Option 3 do both

WHAT WOULD LINNAEUS SAY

Key-Value

httpsenwikipediaorgwikiLinnaean_taxonomy

Graph DB

Document

Columnar (aka BigTable)

Disclaimer heavy overlap

COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups

COLUMNAR EXAMPLES

httpdb-enginescomenranking_trendwide+column+store

KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication

KEY VALUE EXAMPLES

httpdb-enginescomenranking_trendkey-value+store

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 5: Nosql

WAIT I WANT MORE PERFORMANCE

We can go faster

BUT YOU CANrsquoT MOVE IKEA FURNITURE Feature loss Is it a car

WELL WE CAN SOLVE THAT PROBLEM Also it has a very powerful engine

THE CHALLENGE OF PERFORMANCE

ltadd wisdomgt long winded way to nosql is a poor label

SO WHAT IS NOSQL lsquoUM NON-RELATIONALrsquo No good definitions to be found For me Scales horizontally Foregoes the lsquoold schoolrsquo SQL relations concurrency etc

ldquoexactly like SQL (except where itrsquos not)rdquo Trades-in or reimagines most SQL features for lsquosomething elsersquo

Developer friendlydeveloper driven Schema loose semi-structured Usually Open Source and usually associated with web infrastructure Ignoring older non-relational databases of the past Scales Horizontally (usually) ndash did I mention that Can be lsquogluedrsquo to other data stores

Donrsquot like mine create your own definition )

SIDEBAR OBSERVATION ON SOFTWARE TEAMS

Software teams tied to large central relational database (think 1990s2000s) Large relational database lsquogluersquo teams and apps

together leads to complex databases and dbadmins

VsSoftware teams using no sql Independent except at the edges (inputlogs amp

outputreports)

FOWLERrsquoS IMPEDANCE MISMATCH

Java objects vs rows in tables

What I have called Fowlerrsquos Impedance is mentioned in his and Sadlagersquos book NoSQL Distilled

Most of nosql beasties can store data in more interesting ways

CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory

to act as if they were executing on a single node responding to operations one at a time

Most systems are not (exactly)

A is for Availability ldquoFor a distributed system to be continuously available every request

received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo

I think everyone here understands this one )

P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose

arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo

YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have

the correct answer

Availability The system will always answer you might get your checking balance

from last year instead of todayrsquos balance but you will get an answer

Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems

httpscodahalecomyou-cant-sacrifice-partition-tolerance

ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)

WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block

Handling ldquopartition casesrdquo ie part of the systemnetwork is down

DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent

See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray

HOW TO DISTRIBUTE THE DATA

Option 1 shard Option 2 replicate Option 3 do both

WHAT WOULD LINNAEUS SAY

Key-Value

httpsenwikipediaorgwikiLinnaean_taxonomy

Graph DB

Document

Columnar (aka BigTable)

Disclaimer heavy overlap

COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups

COLUMNAR EXAMPLES

httpdb-enginescomenranking_trendwide+column+store

KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication

KEY VALUE EXAMPLES

httpdb-enginescomenranking_trendkey-value+store

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 6: Nosql

BUT YOU CANrsquoT MOVE IKEA FURNITURE Feature loss Is it a car

WELL WE CAN SOLVE THAT PROBLEM Also it has a very powerful engine

THE CHALLENGE OF PERFORMANCE

ltadd wisdomgt long winded way to nosql is a poor label

SO WHAT IS NOSQL lsquoUM NON-RELATIONALrsquo No good definitions to be found For me Scales horizontally Foregoes the lsquoold schoolrsquo SQL relations concurrency etc

ldquoexactly like SQL (except where itrsquos not)rdquo Trades-in or reimagines most SQL features for lsquosomething elsersquo

Developer friendlydeveloper driven Schema loose semi-structured Usually Open Source and usually associated with web infrastructure Ignoring older non-relational databases of the past Scales Horizontally (usually) ndash did I mention that Can be lsquogluedrsquo to other data stores

Donrsquot like mine create your own definition )

SIDEBAR OBSERVATION ON SOFTWARE TEAMS

Software teams tied to large central relational database (think 1990s2000s) Large relational database lsquogluersquo teams and apps

together leads to complex databases and dbadmins

VsSoftware teams using no sql Independent except at the edges (inputlogs amp

outputreports)

FOWLERrsquoS IMPEDANCE MISMATCH

Java objects vs rows in tables

What I have called Fowlerrsquos Impedance is mentioned in his and Sadlagersquos book NoSQL Distilled

Most of nosql beasties can store data in more interesting ways

CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory

to act as if they were executing on a single node responding to operations one at a time

Most systems are not (exactly)

A is for Availability ldquoFor a distributed system to be continuously available every request

received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo

I think everyone here understands this one )

P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose

arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo

YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have

the correct answer

Availability The system will always answer you might get your checking balance

from last year instead of todayrsquos balance but you will get an answer

Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems

httpscodahalecomyou-cant-sacrifice-partition-tolerance

ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)

WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block

Handling ldquopartition casesrdquo ie part of the systemnetwork is down

DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent

See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray

HOW TO DISTRIBUTE THE DATA

Option 1 shard Option 2 replicate Option 3 do both

WHAT WOULD LINNAEUS SAY

Key-Value

httpsenwikipediaorgwikiLinnaean_taxonomy

Graph DB

Document

Columnar (aka BigTable)

Disclaimer heavy overlap

COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups

COLUMNAR EXAMPLES

httpdb-enginescomenranking_trendwide+column+store

KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication

KEY VALUE EXAMPLES

httpdb-enginescomenranking_trendkey-value+store

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 7: Nosql

WELL WE CAN SOLVE THAT PROBLEM Also it has a very powerful engine

THE CHALLENGE OF PERFORMANCE

ltadd wisdomgt long winded way to nosql is a poor label

SO WHAT IS NOSQL lsquoUM NON-RELATIONALrsquo No good definitions to be found For me Scales horizontally Foregoes the lsquoold schoolrsquo SQL relations concurrency etc

ldquoexactly like SQL (except where itrsquos not)rdquo Trades-in or reimagines most SQL features for lsquosomething elsersquo

Developer friendlydeveloper driven Schema loose semi-structured Usually Open Source and usually associated with web infrastructure Ignoring older non-relational databases of the past Scales Horizontally (usually) ndash did I mention that Can be lsquogluedrsquo to other data stores

Donrsquot like mine create your own definition )

SIDEBAR OBSERVATION ON SOFTWARE TEAMS

Software teams tied to large central relational database (think 1990s2000s) Large relational database lsquogluersquo teams and apps

together leads to complex databases and dbadmins

VsSoftware teams using no sql Independent except at the edges (inputlogs amp

outputreports)

FOWLERrsquoS IMPEDANCE MISMATCH

Java objects vs rows in tables

What I have called Fowlerrsquos Impedance is mentioned in his and Sadlagersquos book NoSQL Distilled

Most of nosql beasties can store data in more interesting ways

CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory

to act as if they were executing on a single node responding to operations one at a time

Most systems are not (exactly)

A is for Availability ldquoFor a distributed system to be continuously available every request

received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo

I think everyone here understands this one )

P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose

arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo

YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have

the correct answer

Availability The system will always answer you might get your checking balance

from last year instead of todayrsquos balance but you will get an answer

Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems

httpscodahalecomyou-cant-sacrifice-partition-tolerance

ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)

WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block

Handling ldquopartition casesrdquo ie part of the systemnetwork is down

DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent

See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray

HOW TO DISTRIBUTE THE DATA

Option 1 shard Option 2 replicate Option 3 do both

WHAT WOULD LINNAEUS SAY

Key-Value

httpsenwikipediaorgwikiLinnaean_taxonomy

Graph DB

Document

Columnar (aka BigTable)

Disclaimer heavy overlap

COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups

COLUMNAR EXAMPLES

httpdb-enginescomenranking_trendwide+column+store

KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication

KEY VALUE EXAMPLES

httpdb-enginescomenranking_trendkey-value+store

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 8: Nosql

THE CHALLENGE OF PERFORMANCE

ltadd wisdomgt long winded way to nosql is a poor label

SO WHAT IS NOSQL lsquoUM NON-RELATIONALrsquo No good definitions to be found For me Scales horizontally Foregoes the lsquoold schoolrsquo SQL relations concurrency etc

ldquoexactly like SQL (except where itrsquos not)rdquo Trades-in or reimagines most SQL features for lsquosomething elsersquo

Developer friendlydeveloper driven Schema loose semi-structured Usually Open Source and usually associated with web infrastructure Ignoring older non-relational databases of the past Scales Horizontally (usually) ndash did I mention that Can be lsquogluedrsquo to other data stores

Donrsquot like mine create your own definition )

SIDEBAR OBSERVATION ON SOFTWARE TEAMS

Software teams tied to large central relational database (think 1990s2000s) Large relational database lsquogluersquo teams and apps

together leads to complex databases and dbadmins

VsSoftware teams using no sql Independent except at the edges (inputlogs amp

outputreports)

FOWLERrsquoS IMPEDANCE MISMATCH

Java objects vs rows in tables

What I have called Fowlerrsquos Impedance is mentioned in his and Sadlagersquos book NoSQL Distilled

Most of nosql beasties can store data in more interesting ways

CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory

to act as if they were executing on a single node responding to operations one at a time

Most systems are not (exactly)

A is for Availability ldquoFor a distributed system to be continuously available every request

received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo

I think everyone here understands this one )

P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose

arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo

YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have

the correct answer

Availability The system will always answer you might get your checking balance

from last year instead of todayrsquos balance but you will get an answer

Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems

httpscodahalecomyou-cant-sacrifice-partition-tolerance

ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)

WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block

Handling ldquopartition casesrdquo ie part of the systemnetwork is down

DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent

See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray

HOW TO DISTRIBUTE THE DATA

Option 1 shard Option 2 replicate Option 3 do both

WHAT WOULD LINNAEUS SAY

Key-Value

httpsenwikipediaorgwikiLinnaean_taxonomy

Graph DB

Document

Columnar (aka BigTable)

Disclaimer heavy overlap

COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups

COLUMNAR EXAMPLES

httpdb-enginescomenranking_trendwide+column+store

KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication

KEY VALUE EXAMPLES

httpdb-enginescomenranking_trendkey-value+store

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 9: Nosql

SO WHAT IS NOSQL lsquoUM NON-RELATIONALrsquo No good definitions to be found For me Scales horizontally Foregoes the lsquoold schoolrsquo SQL relations concurrency etc

ldquoexactly like SQL (except where itrsquos not)rdquo Trades-in or reimagines most SQL features for lsquosomething elsersquo

Developer friendlydeveloper driven Schema loose semi-structured Usually Open Source and usually associated with web infrastructure Ignoring older non-relational databases of the past Scales Horizontally (usually) ndash did I mention that Can be lsquogluedrsquo to other data stores

Donrsquot like mine create your own definition )

SIDEBAR OBSERVATION ON SOFTWARE TEAMS

Software teams tied to large central relational database (think 1990s2000s) Large relational database lsquogluersquo teams and apps

together leads to complex databases and dbadmins

VsSoftware teams using no sql Independent except at the edges (inputlogs amp

outputreports)

FOWLERrsquoS IMPEDANCE MISMATCH

Java objects vs rows in tables

What I have called Fowlerrsquos Impedance is mentioned in his and Sadlagersquos book NoSQL Distilled

Most of nosql beasties can store data in more interesting ways

CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory

to act as if they were executing on a single node responding to operations one at a time

Most systems are not (exactly)

A is for Availability ldquoFor a distributed system to be continuously available every request

received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo

I think everyone here understands this one )

P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose

arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo

YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have

the correct answer

Availability The system will always answer you might get your checking balance

from last year instead of todayrsquos balance but you will get an answer

Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems

httpscodahalecomyou-cant-sacrifice-partition-tolerance

ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)

WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block

Handling ldquopartition casesrdquo ie part of the systemnetwork is down

DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent

See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray

HOW TO DISTRIBUTE THE DATA

Option 1 shard Option 2 replicate Option 3 do both

WHAT WOULD LINNAEUS SAY

Key-Value

httpsenwikipediaorgwikiLinnaean_taxonomy

Graph DB

Document

Columnar (aka BigTable)

Disclaimer heavy overlap

COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups

COLUMNAR EXAMPLES

httpdb-enginescomenranking_trendwide+column+store

KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication

KEY VALUE EXAMPLES

httpdb-enginescomenranking_trendkey-value+store

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 10: Nosql

SIDEBAR OBSERVATION ON SOFTWARE TEAMS

Software teams tied to large central relational database (think 1990s2000s) Large relational database lsquogluersquo teams and apps

together leads to complex databases and dbadmins

VsSoftware teams using no sql Independent except at the edges (inputlogs amp

outputreports)

FOWLERrsquoS IMPEDANCE MISMATCH

Java objects vs rows in tables

What I have called Fowlerrsquos Impedance is mentioned in his and Sadlagersquos book NoSQL Distilled

Most of nosql beasties can store data in more interesting ways

CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory

to act as if they were executing on a single node responding to operations one at a time

Most systems are not (exactly)

A is for Availability ldquoFor a distributed system to be continuously available every request

received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo

I think everyone here understands this one )

P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose

arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo

YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have

the correct answer

Availability The system will always answer you might get your checking balance

from last year instead of todayrsquos balance but you will get an answer

Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems

httpscodahalecomyou-cant-sacrifice-partition-tolerance

ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)

WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block

Handling ldquopartition casesrdquo ie part of the systemnetwork is down

DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent

See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray

HOW TO DISTRIBUTE THE DATA

Option 1 shard Option 2 replicate Option 3 do both

WHAT WOULD LINNAEUS SAY

Key-Value

httpsenwikipediaorgwikiLinnaean_taxonomy

Graph DB

Document

Columnar (aka BigTable)

Disclaimer heavy overlap

COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups

COLUMNAR EXAMPLES

httpdb-enginescomenranking_trendwide+column+store

KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication

KEY VALUE EXAMPLES

httpdb-enginescomenranking_trendkey-value+store

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 11: Nosql

FOWLERrsquoS IMPEDANCE MISMATCH

Java objects vs rows in tables

What I have called Fowlerrsquos Impedance is mentioned in his and Sadlagersquos book NoSQL Distilled

Most of nosql beasties can store data in more interesting ways

CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory

to act as if they were executing on a single node responding to operations one at a time

Most systems are not (exactly)

A is for Availability ldquoFor a distributed system to be continuously available every request

received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo

I think everyone here understands this one )

P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose

arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo

YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have

the correct answer

Availability The system will always answer you might get your checking balance

from last year instead of todayrsquos balance but you will get an answer

Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems

httpscodahalecomyou-cant-sacrifice-partition-tolerance

ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)

WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block

Handling ldquopartition casesrdquo ie part of the systemnetwork is down

DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent

See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray

HOW TO DISTRIBUTE THE DATA

Option 1 shard Option 2 replicate Option 3 do both

WHAT WOULD LINNAEUS SAY

Key-Value

httpsenwikipediaorgwikiLinnaean_taxonomy

Graph DB

Document

Columnar (aka BigTable)

Disclaimer heavy overlap

COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups

COLUMNAR EXAMPLES

httpdb-enginescomenranking_trendwide+column+store

KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication

KEY VALUE EXAMPLES

httpdb-enginescomenranking_trendkey-value+store

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 12: Nosql

CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory

to act as if they were executing on a single node responding to operations one at a time

Most systems are not (exactly)

A is for Availability ldquoFor a distributed system to be continuously available every request

received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo

I think everyone here understands this one )

P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose

arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo

YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have

the correct answer

Availability The system will always answer you might get your checking balance

from last year instead of todayrsquos balance but you will get an answer

Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems

httpscodahalecomyou-cant-sacrifice-partition-tolerance

ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)

WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block

Handling ldquopartition casesrdquo ie part of the systemnetwork is down

DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent

See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray

HOW TO DISTRIBUTE THE DATA

Option 1 shard Option 2 replicate Option 3 do both

WHAT WOULD LINNAEUS SAY

Key-Value

httpsenwikipediaorgwikiLinnaean_taxonomy

Graph DB

Document

Columnar (aka BigTable)

Disclaimer heavy overlap

COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups

COLUMNAR EXAMPLES

httpdb-enginescomenranking_trendwide+column+store

KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication

KEY VALUE EXAMPLES

httpdb-enginescomenranking_trendkey-value+store

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 13: Nosql

YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have

the correct answer

Availability The system will always answer you might get your checking balance

from last year instead of todayrsquos balance but you will get an answer

Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems

httpscodahalecomyou-cant-sacrifice-partition-tolerance

ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)

WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block

Handling ldquopartition casesrdquo ie part of the systemnetwork is down

DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent

See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray

HOW TO DISTRIBUTE THE DATA

Option 1 shard Option 2 replicate Option 3 do both

WHAT WOULD LINNAEUS SAY

Key-Value

httpsenwikipediaorgwikiLinnaean_taxonomy

Graph DB

Document

Columnar (aka BigTable)

Disclaimer heavy overlap

COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups

COLUMNAR EXAMPLES

httpdb-enginescomenranking_trendwide+column+store

KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication

KEY VALUE EXAMPLES

httpdb-enginescomenranking_trendkey-value+store

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 14: Nosql

ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)

WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block

Handling ldquopartition casesrdquo ie part of the systemnetwork is down

DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent

See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray

HOW TO DISTRIBUTE THE DATA

Option 1 shard Option 2 replicate Option 3 do both

WHAT WOULD LINNAEUS SAY

Key-Value

httpsenwikipediaorgwikiLinnaean_taxonomy

Graph DB

Document

Columnar (aka BigTable)

Disclaimer heavy overlap

COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups

COLUMNAR EXAMPLES

httpdb-enginescomenranking_trendwide+column+store

KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication

KEY VALUE EXAMPLES

httpdb-enginescomenranking_trendkey-value+store

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 15: Nosql

WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block

Handling ldquopartition casesrdquo ie part of the systemnetwork is down

DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent

See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray

HOW TO DISTRIBUTE THE DATA

Option 1 shard Option 2 replicate Option 3 do both

WHAT WOULD LINNAEUS SAY

Key-Value

httpsenwikipediaorgwikiLinnaean_taxonomy

Graph DB

Document

Columnar (aka BigTable)

Disclaimer heavy overlap

COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups

COLUMNAR EXAMPLES

httpdb-enginescomenranking_trendwide+column+store

KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication

KEY VALUE EXAMPLES

httpdb-enginescomenranking_trendkey-value+store

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 16: Nosql

DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent

See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray

HOW TO DISTRIBUTE THE DATA

Option 1 shard Option 2 replicate Option 3 do both

WHAT WOULD LINNAEUS SAY

Key-Value

httpsenwikipediaorgwikiLinnaean_taxonomy

Graph DB

Document

Columnar (aka BigTable)

Disclaimer heavy overlap

COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups

COLUMNAR EXAMPLES

httpdb-enginescomenranking_trendwide+column+store

KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication

KEY VALUE EXAMPLES

httpdb-enginescomenranking_trendkey-value+store

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 17: Nosql

HOW TO DISTRIBUTE THE DATA

Option 1 shard Option 2 replicate Option 3 do both

WHAT WOULD LINNAEUS SAY

Key-Value

httpsenwikipediaorgwikiLinnaean_taxonomy

Graph DB

Document

Columnar (aka BigTable)

Disclaimer heavy overlap

COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups

COLUMNAR EXAMPLES

httpdb-enginescomenranking_trendwide+column+store

KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication

KEY VALUE EXAMPLES

httpdb-enginescomenranking_trendkey-value+store

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 18: Nosql

WHAT WOULD LINNAEUS SAY

Key-Value

httpsenwikipediaorgwikiLinnaean_taxonomy

Graph DB

Document

Columnar (aka BigTable)

Disclaimer heavy overlap

COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups

COLUMNAR EXAMPLES

httpdb-enginescomenranking_trendwide+column+store

KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication

KEY VALUE EXAMPLES

httpdb-enginescomenranking_trendkey-value+store

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 19: Nosql

COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups

COLUMNAR EXAMPLES

httpdb-enginescomenranking_trendwide+column+store

KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication

KEY VALUE EXAMPLES

httpdb-enginescomenranking_trendkey-value+store

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 20: Nosql

COLUMNAR EXAMPLES

httpdb-enginescomenranking_trendwide+column+store

KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication

KEY VALUE EXAMPLES

httpdb-enginescomenranking_trendkey-value+store

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 21: Nosql

KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication

KEY VALUE EXAMPLES

httpdb-enginescomenranking_trendkey-value+store

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 22: Nosql

KEY VALUE EXAMPLES

httpdb-enginescomenranking_trendkey-value+store

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 23: Nosql

DOCUMENT STORES Similar to key-value but the value is a document

Document is stored in json (or similar) Flexible schema Some support keysreferencesindices

ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 24: Nosql

DOCUMENT EXAMPLES

httpdb-enginescomenranking_trenddocument+store

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 25: Nosql

GRAPH DATABASES Remember your data structures class in college

Edges and vertices ndash both can hold data

Reduces tough sql queries to simple graph queries

Easier to model ndash lsquomatches the whiteboardrsquo

Relationships between vertices are first class

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 26: Nosql

GRAPH DB EXAMPLES

httpdb-enginescomenranking_trenddocument+store

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 27: Nosql

HBASE Nosql on top of hadoop

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 28: Nosql

SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 29: Nosql

Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift

All operations are atomic at the row level (via write ahead logs)

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 30: Nosql

KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 31: Nosql

COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 32: Nosql

REGIONS Looks like shards ndash different key ranges per box no overlap

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 33: Nosql

CASSANDRA Tunable nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 34: Nosql

CAP WITH QUORUMSKNOB TWEAKING

Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 35: Nosql

CAP WITH QUORUMSKNOB TWEAKING

Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the

database is donerdquo how many successful reads out of a full set ==

ldquohere is your datardquo

Higher the values longer the wait

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 36: Nosql

GOSSIP AND PEERING Whose up Passing requests Handling missing nodes

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 37: Nosql

DATA ColumnFamilies Keys and Values Speed via appending data and timestamps

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 38: Nosql

KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 39: Nosql

RIAK

simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 40: Nosql

DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store

And then return answers back up the stack

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 41: Nosql

SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 42: Nosql

KEYS AND BUCKETS Riak can create them automatically (and return to you the key)

httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 43: Nosql

LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo

Link walking ^ can create other structures

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 44: Nosql

HOMEWORK AND OTHER READINGSGENERAL

Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf

Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml

Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc

eptpdf

ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p

All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128

NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 45: Nosql

HOMEWORK AND OTHER READINGS CONTrsquoD

Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen

archivebigtable-osdi06pdf

Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 46: Nosql

HOMEWORK AND OTHER READINGS CONTrsquoD

bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom

bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar

yviewriak-core9781449306144part00htmlautoStart=True

bull Riak Handbookndash httpwwwriakhandbookcom

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 47: Nosql

READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples
Page 48: Nosql

RELATIONAL DATABASE EXAMPLES

httpdb-enginescomenranking_trendrelational+dbms

  • nosql
  • Irsquom ERic
  • Whatrsquos wrong with relational databases
  • But first a poor metaphor
  • Wait I want more performance
  • But you canrsquot move IKEA furniture
  • Well we can solve that problem
  • The challenge of performance
  • So what is NoSQL lsquoum non-relationalrsquo
  • Sidebar observation on software teams
  • Fowlerrsquos Impedance Mismatch
  • CAP
  • You can have two
  • Only two the fine print
  • Why would anyone be inconsistent
  • Db chemistry ndash more buzz
  • How to distribute the data
  • What would Linnaeus say
  • Columnar Stores
  • Columnar Examples
  • Key-Value Stores
  • Key Value Examples
  • Document Stores
  • Document Examples
  • Graph databases
  • Graph DB Examples
  • HBase
  • Sits on top of HDFS
  • Slide 29
  • Kind of SQL
  • Column families
  • Regions
  • Cassandra
  • CAP with quorums knob tweaking
  • CAP with quorums knob tweaking (2)
  • Gossip and peering
  • Data
  • Key value dynamo
  • Riak
  • Distributed
  • Servers
  • Keys and buckets
  • links
  • Homework and other readings General
  • Homework and other readings contrsquod
  • Homework and other readings contrsquod
  • Readings for graphs
  • Relational database examples