nosql
TRANSCRIPT
NOSQL Eric MarshallApril 7th 2016For LOPSA NJ
IrsquoM ERIC I work at Airisdata and we are hiring httpairisdatacom
WHATrsquoS WRONG WITH RELATIONAL DATABASES Nothing ) Google amp Amazon (followed by web tech) Higher Performance Larger Scale Lower Cost New Capabilities
BUT FIRST A POOR METAPHOR
Cars What leads to better performance
bull Bigger engine remove excess weightfeatures
bull Better controlssteeringbraking
WAIT I WANT MORE PERFORMANCE
We can go faster
BUT YOU CANrsquoT MOVE IKEA FURNITURE Feature loss Is it a car
WELL WE CAN SOLVE THAT PROBLEM Also it has a very powerful engine
THE CHALLENGE OF PERFORMANCE
ltadd wisdomgt long winded way to nosql is a poor label
SO WHAT IS NOSQL lsquoUM NON-RELATIONALrsquo No good definitions to be found For me Scales horizontally Foregoes the lsquoold schoolrsquo SQL relations concurrency etc
ldquoexactly like SQL (except where itrsquos not)rdquo Trades-in or reimagines most SQL features for lsquosomething elsersquo
Developer friendlydeveloper driven Schema loose semi-structured Usually Open Source and usually associated with web infrastructure Ignoring older non-relational databases of the past Scales Horizontally (usually) ndash did I mention that Can be lsquogluedrsquo to other data stores
Donrsquot like mine create your own definition )
SIDEBAR OBSERVATION ON SOFTWARE TEAMS
Software teams tied to large central relational database (think 1990s2000s) Large relational database lsquogluersquo teams and apps
together leads to complex databases and dbadmins
VsSoftware teams using no sql Independent except at the edges (inputlogs amp
outputreports)
FOWLERrsquoS IMPEDANCE MISMATCH
Java objects vs rows in tables
What I have called Fowlerrsquos Impedance is mentioned in his and Sadlagersquos book NoSQL Distilled
Most of nosql beasties can store data in more interesting ways
CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory
to act as if they were executing on a single node responding to operations one at a time
Most systems are not (exactly)
A is for Availability ldquoFor a distributed system to be continuously available every request
received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo
I think everyone here understands this one )
P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose
arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo
YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have
the correct answer
Availability The system will always answer you might get your checking balance
from last year instead of todayrsquos balance but you will get an answer
Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems
httpscodahalecomyou-cant-sacrifice-partition-tolerance
ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)
WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block
Handling ldquopartition casesrdquo ie part of the systemnetwork is down
DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent
See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray
HOW TO DISTRIBUTE THE DATA
Option 1 shard Option 2 replicate Option 3 do both
WHAT WOULD LINNAEUS SAY
Key-Value
httpsenwikipediaorgwikiLinnaean_taxonomy
Graph DB
Document
Columnar (aka BigTable)
Disclaimer heavy overlap
COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups
COLUMNAR EXAMPLES
httpdb-enginescomenranking_trendwide+column+store
KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication
KEY VALUE EXAMPLES
httpdb-enginescomenranking_trendkey-value+store
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
IrsquoM ERIC I work at Airisdata and we are hiring httpairisdatacom
WHATrsquoS WRONG WITH RELATIONAL DATABASES Nothing ) Google amp Amazon (followed by web tech) Higher Performance Larger Scale Lower Cost New Capabilities
BUT FIRST A POOR METAPHOR
Cars What leads to better performance
bull Bigger engine remove excess weightfeatures
bull Better controlssteeringbraking
WAIT I WANT MORE PERFORMANCE
We can go faster
BUT YOU CANrsquoT MOVE IKEA FURNITURE Feature loss Is it a car
WELL WE CAN SOLVE THAT PROBLEM Also it has a very powerful engine
THE CHALLENGE OF PERFORMANCE
ltadd wisdomgt long winded way to nosql is a poor label
SO WHAT IS NOSQL lsquoUM NON-RELATIONALrsquo No good definitions to be found For me Scales horizontally Foregoes the lsquoold schoolrsquo SQL relations concurrency etc
ldquoexactly like SQL (except where itrsquos not)rdquo Trades-in or reimagines most SQL features for lsquosomething elsersquo
Developer friendlydeveloper driven Schema loose semi-structured Usually Open Source and usually associated with web infrastructure Ignoring older non-relational databases of the past Scales Horizontally (usually) ndash did I mention that Can be lsquogluedrsquo to other data stores
Donrsquot like mine create your own definition )
SIDEBAR OBSERVATION ON SOFTWARE TEAMS
Software teams tied to large central relational database (think 1990s2000s) Large relational database lsquogluersquo teams and apps
together leads to complex databases and dbadmins
VsSoftware teams using no sql Independent except at the edges (inputlogs amp
outputreports)
FOWLERrsquoS IMPEDANCE MISMATCH
Java objects vs rows in tables
What I have called Fowlerrsquos Impedance is mentioned in his and Sadlagersquos book NoSQL Distilled
Most of nosql beasties can store data in more interesting ways
CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory
to act as if they were executing on a single node responding to operations one at a time
Most systems are not (exactly)
A is for Availability ldquoFor a distributed system to be continuously available every request
received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo
I think everyone here understands this one )
P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose
arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo
YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have
the correct answer
Availability The system will always answer you might get your checking balance
from last year instead of todayrsquos balance but you will get an answer
Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems
httpscodahalecomyou-cant-sacrifice-partition-tolerance
ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)
WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block
Handling ldquopartition casesrdquo ie part of the systemnetwork is down
DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent
See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray
HOW TO DISTRIBUTE THE DATA
Option 1 shard Option 2 replicate Option 3 do both
WHAT WOULD LINNAEUS SAY
Key-Value
httpsenwikipediaorgwikiLinnaean_taxonomy
Graph DB
Document
Columnar (aka BigTable)
Disclaimer heavy overlap
COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups
COLUMNAR EXAMPLES
httpdb-enginescomenranking_trendwide+column+store
KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication
KEY VALUE EXAMPLES
httpdb-enginescomenranking_trendkey-value+store
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
WHATrsquoS WRONG WITH RELATIONAL DATABASES Nothing ) Google amp Amazon (followed by web tech) Higher Performance Larger Scale Lower Cost New Capabilities
BUT FIRST A POOR METAPHOR
Cars What leads to better performance
bull Bigger engine remove excess weightfeatures
bull Better controlssteeringbraking
WAIT I WANT MORE PERFORMANCE
We can go faster
BUT YOU CANrsquoT MOVE IKEA FURNITURE Feature loss Is it a car
WELL WE CAN SOLVE THAT PROBLEM Also it has a very powerful engine
THE CHALLENGE OF PERFORMANCE
ltadd wisdomgt long winded way to nosql is a poor label
SO WHAT IS NOSQL lsquoUM NON-RELATIONALrsquo No good definitions to be found For me Scales horizontally Foregoes the lsquoold schoolrsquo SQL relations concurrency etc
ldquoexactly like SQL (except where itrsquos not)rdquo Trades-in or reimagines most SQL features for lsquosomething elsersquo
Developer friendlydeveloper driven Schema loose semi-structured Usually Open Source and usually associated with web infrastructure Ignoring older non-relational databases of the past Scales Horizontally (usually) ndash did I mention that Can be lsquogluedrsquo to other data stores
Donrsquot like mine create your own definition )
SIDEBAR OBSERVATION ON SOFTWARE TEAMS
Software teams tied to large central relational database (think 1990s2000s) Large relational database lsquogluersquo teams and apps
together leads to complex databases and dbadmins
VsSoftware teams using no sql Independent except at the edges (inputlogs amp
outputreports)
FOWLERrsquoS IMPEDANCE MISMATCH
Java objects vs rows in tables
What I have called Fowlerrsquos Impedance is mentioned in his and Sadlagersquos book NoSQL Distilled
Most of nosql beasties can store data in more interesting ways
CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory
to act as if they were executing on a single node responding to operations one at a time
Most systems are not (exactly)
A is for Availability ldquoFor a distributed system to be continuously available every request
received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo
I think everyone here understands this one )
P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose
arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo
YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have
the correct answer
Availability The system will always answer you might get your checking balance
from last year instead of todayrsquos balance but you will get an answer
Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems
httpscodahalecomyou-cant-sacrifice-partition-tolerance
ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)
WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block
Handling ldquopartition casesrdquo ie part of the systemnetwork is down
DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent
See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray
HOW TO DISTRIBUTE THE DATA
Option 1 shard Option 2 replicate Option 3 do both
WHAT WOULD LINNAEUS SAY
Key-Value
httpsenwikipediaorgwikiLinnaean_taxonomy
Graph DB
Document
Columnar (aka BigTable)
Disclaimer heavy overlap
COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups
COLUMNAR EXAMPLES
httpdb-enginescomenranking_trendwide+column+store
KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication
KEY VALUE EXAMPLES
httpdb-enginescomenranking_trendkey-value+store
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
BUT FIRST A POOR METAPHOR
Cars What leads to better performance
bull Bigger engine remove excess weightfeatures
bull Better controlssteeringbraking
WAIT I WANT MORE PERFORMANCE
We can go faster
BUT YOU CANrsquoT MOVE IKEA FURNITURE Feature loss Is it a car
WELL WE CAN SOLVE THAT PROBLEM Also it has a very powerful engine
THE CHALLENGE OF PERFORMANCE
ltadd wisdomgt long winded way to nosql is a poor label
SO WHAT IS NOSQL lsquoUM NON-RELATIONALrsquo No good definitions to be found For me Scales horizontally Foregoes the lsquoold schoolrsquo SQL relations concurrency etc
ldquoexactly like SQL (except where itrsquos not)rdquo Trades-in or reimagines most SQL features for lsquosomething elsersquo
Developer friendlydeveloper driven Schema loose semi-structured Usually Open Source and usually associated with web infrastructure Ignoring older non-relational databases of the past Scales Horizontally (usually) ndash did I mention that Can be lsquogluedrsquo to other data stores
Donrsquot like mine create your own definition )
SIDEBAR OBSERVATION ON SOFTWARE TEAMS
Software teams tied to large central relational database (think 1990s2000s) Large relational database lsquogluersquo teams and apps
together leads to complex databases and dbadmins
VsSoftware teams using no sql Independent except at the edges (inputlogs amp
outputreports)
FOWLERrsquoS IMPEDANCE MISMATCH
Java objects vs rows in tables
What I have called Fowlerrsquos Impedance is mentioned in his and Sadlagersquos book NoSQL Distilled
Most of nosql beasties can store data in more interesting ways
CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory
to act as if they were executing on a single node responding to operations one at a time
Most systems are not (exactly)
A is for Availability ldquoFor a distributed system to be continuously available every request
received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo
I think everyone here understands this one )
P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose
arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo
YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have
the correct answer
Availability The system will always answer you might get your checking balance
from last year instead of todayrsquos balance but you will get an answer
Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems
httpscodahalecomyou-cant-sacrifice-partition-tolerance
ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)
WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block
Handling ldquopartition casesrdquo ie part of the systemnetwork is down
DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent
See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray
HOW TO DISTRIBUTE THE DATA
Option 1 shard Option 2 replicate Option 3 do both
WHAT WOULD LINNAEUS SAY
Key-Value
httpsenwikipediaorgwikiLinnaean_taxonomy
Graph DB
Document
Columnar (aka BigTable)
Disclaimer heavy overlap
COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups
COLUMNAR EXAMPLES
httpdb-enginescomenranking_trendwide+column+store
KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication
KEY VALUE EXAMPLES
httpdb-enginescomenranking_trendkey-value+store
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
WAIT I WANT MORE PERFORMANCE
We can go faster
BUT YOU CANrsquoT MOVE IKEA FURNITURE Feature loss Is it a car
WELL WE CAN SOLVE THAT PROBLEM Also it has a very powerful engine
THE CHALLENGE OF PERFORMANCE
ltadd wisdomgt long winded way to nosql is a poor label
SO WHAT IS NOSQL lsquoUM NON-RELATIONALrsquo No good definitions to be found For me Scales horizontally Foregoes the lsquoold schoolrsquo SQL relations concurrency etc
ldquoexactly like SQL (except where itrsquos not)rdquo Trades-in or reimagines most SQL features for lsquosomething elsersquo
Developer friendlydeveloper driven Schema loose semi-structured Usually Open Source and usually associated with web infrastructure Ignoring older non-relational databases of the past Scales Horizontally (usually) ndash did I mention that Can be lsquogluedrsquo to other data stores
Donrsquot like mine create your own definition )
SIDEBAR OBSERVATION ON SOFTWARE TEAMS
Software teams tied to large central relational database (think 1990s2000s) Large relational database lsquogluersquo teams and apps
together leads to complex databases and dbadmins
VsSoftware teams using no sql Independent except at the edges (inputlogs amp
outputreports)
FOWLERrsquoS IMPEDANCE MISMATCH
Java objects vs rows in tables
What I have called Fowlerrsquos Impedance is mentioned in his and Sadlagersquos book NoSQL Distilled
Most of nosql beasties can store data in more interesting ways
CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory
to act as if they were executing on a single node responding to operations one at a time
Most systems are not (exactly)
A is for Availability ldquoFor a distributed system to be continuously available every request
received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo
I think everyone here understands this one )
P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose
arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo
YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have
the correct answer
Availability The system will always answer you might get your checking balance
from last year instead of todayrsquos balance but you will get an answer
Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems
httpscodahalecomyou-cant-sacrifice-partition-tolerance
ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)
WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block
Handling ldquopartition casesrdquo ie part of the systemnetwork is down
DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent
See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray
HOW TO DISTRIBUTE THE DATA
Option 1 shard Option 2 replicate Option 3 do both
WHAT WOULD LINNAEUS SAY
Key-Value
httpsenwikipediaorgwikiLinnaean_taxonomy
Graph DB
Document
Columnar (aka BigTable)
Disclaimer heavy overlap
COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups
COLUMNAR EXAMPLES
httpdb-enginescomenranking_trendwide+column+store
KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication
KEY VALUE EXAMPLES
httpdb-enginescomenranking_trendkey-value+store
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
BUT YOU CANrsquoT MOVE IKEA FURNITURE Feature loss Is it a car
WELL WE CAN SOLVE THAT PROBLEM Also it has a very powerful engine
THE CHALLENGE OF PERFORMANCE
ltadd wisdomgt long winded way to nosql is a poor label
SO WHAT IS NOSQL lsquoUM NON-RELATIONALrsquo No good definitions to be found For me Scales horizontally Foregoes the lsquoold schoolrsquo SQL relations concurrency etc
ldquoexactly like SQL (except where itrsquos not)rdquo Trades-in or reimagines most SQL features for lsquosomething elsersquo
Developer friendlydeveloper driven Schema loose semi-structured Usually Open Source and usually associated with web infrastructure Ignoring older non-relational databases of the past Scales Horizontally (usually) ndash did I mention that Can be lsquogluedrsquo to other data stores
Donrsquot like mine create your own definition )
SIDEBAR OBSERVATION ON SOFTWARE TEAMS
Software teams tied to large central relational database (think 1990s2000s) Large relational database lsquogluersquo teams and apps
together leads to complex databases and dbadmins
VsSoftware teams using no sql Independent except at the edges (inputlogs amp
outputreports)
FOWLERrsquoS IMPEDANCE MISMATCH
Java objects vs rows in tables
What I have called Fowlerrsquos Impedance is mentioned in his and Sadlagersquos book NoSQL Distilled
Most of nosql beasties can store data in more interesting ways
CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory
to act as if they were executing on a single node responding to operations one at a time
Most systems are not (exactly)
A is for Availability ldquoFor a distributed system to be continuously available every request
received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo
I think everyone here understands this one )
P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose
arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo
YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have
the correct answer
Availability The system will always answer you might get your checking balance
from last year instead of todayrsquos balance but you will get an answer
Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems
httpscodahalecomyou-cant-sacrifice-partition-tolerance
ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)
WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block
Handling ldquopartition casesrdquo ie part of the systemnetwork is down
DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent
See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray
HOW TO DISTRIBUTE THE DATA
Option 1 shard Option 2 replicate Option 3 do both
WHAT WOULD LINNAEUS SAY
Key-Value
httpsenwikipediaorgwikiLinnaean_taxonomy
Graph DB
Document
Columnar (aka BigTable)
Disclaimer heavy overlap
COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups
COLUMNAR EXAMPLES
httpdb-enginescomenranking_trendwide+column+store
KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication
KEY VALUE EXAMPLES
httpdb-enginescomenranking_trendkey-value+store
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
WELL WE CAN SOLVE THAT PROBLEM Also it has a very powerful engine
THE CHALLENGE OF PERFORMANCE
ltadd wisdomgt long winded way to nosql is a poor label
SO WHAT IS NOSQL lsquoUM NON-RELATIONALrsquo No good definitions to be found For me Scales horizontally Foregoes the lsquoold schoolrsquo SQL relations concurrency etc
ldquoexactly like SQL (except where itrsquos not)rdquo Trades-in or reimagines most SQL features for lsquosomething elsersquo
Developer friendlydeveloper driven Schema loose semi-structured Usually Open Source and usually associated with web infrastructure Ignoring older non-relational databases of the past Scales Horizontally (usually) ndash did I mention that Can be lsquogluedrsquo to other data stores
Donrsquot like mine create your own definition )
SIDEBAR OBSERVATION ON SOFTWARE TEAMS
Software teams tied to large central relational database (think 1990s2000s) Large relational database lsquogluersquo teams and apps
together leads to complex databases and dbadmins
VsSoftware teams using no sql Independent except at the edges (inputlogs amp
outputreports)
FOWLERrsquoS IMPEDANCE MISMATCH
Java objects vs rows in tables
What I have called Fowlerrsquos Impedance is mentioned in his and Sadlagersquos book NoSQL Distilled
Most of nosql beasties can store data in more interesting ways
CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory
to act as if they were executing on a single node responding to operations one at a time
Most systems are not (exactly)
A is for Availability ldquoFor a distributed system to be continuously available every request
received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo
I think everyone here understands this one )
P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose
arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo
YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have
the correct answer
Availability The system will always answer you might get your checking balance
from last year instead of todayrsquos balance but you will get an answer
Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems
httpscodahalecomyou-cant-sacrifice-partition-tolerance
ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)
WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block
Handling ldquopartition casesrdquo ie part of the systemnetwork is down
DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent
See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray
HOW TO DISTRIBUTE THE DATA
Option 1 shard Option 2 replicate Option 3 do both
WHAT WOULD LINNAEUS SAY
Key-Value
httpsenwikipediaorgwikiLinnaean_taxonomy
Graph DB
Document
Columnar (aka BigTable)
Disclaimer heavy overlap
COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups
COLUMNAR EXAMPLES
httpdb-enginescomenranking_trendwide+column+store
KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication
KEY VALUE EXAMPLES
httpdb-enginescomenranking_trendkey-value+store
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
THE CHALLENGE OF PERFORMANCE
ltadd wisdomgt long winded way to nosql is a poor label
SO WHAT IS NOSQL lsquoUM NON-RELATIONALrsquo No good definitions to be found For me Scales horizontally Foregoes the lsquoold schoolrsquo SQL relations concurrency etc
ldquoexactly like SQL (except where itrsquos not)rdquo Trades-in or reimagines most SQL features for lsquosomething elsersquo
Developer friendlydeveloper driven Schema loose semi-structured Usually Open Source and usually associated with web infrastructure Ignoring older non-relational databases of the past Scales Horizontally (usually) ndash did I mention that Can be lsquogluedrsquo to other data stores
Donrsquot like mine create your own definition )
SIDEBAR OBSERVATION ON SOFTWARE TEAMS
Software teams tied to large central relational database (think 1990s2000s) Large relational database lsquogluersquo teams and apps
together leads to complex databases and dbadmins
VsSoftware teams using no sql Independent except at the edges (inputlogs amp
outputreports)
FOWLERrsquoS IMPEDANCE MISMATCH
Java objects vs rows in tables
What I have called Fowlerrsquos Impedance is mentioned in his and Sadlagersquos book NoSQL Distilled
Most of nosql beasties can store data in more interesting ways
CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory
to act as if they were executing on a single node responding to operations one at a time
Most systems are not (exactly)
A is for Availability ldquoFor a distributed system to be continuously available every request
received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo
I think everyone here understands this one )
P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose
arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo
YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have
the correct answer
Availability The system will always answer you might get your checking balance
from last year instead of todayrsquos balance but you will get an answer
Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems
httpscodahalecomyou-cant-sacrifice-partition-tolerance
ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)
WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block
Handling ldquopartition casesrdquo ie part of the systemnetwork is down
DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent
See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray
HOW TO DISTRIBUTE THE DATA
Option 1 shard Option 2 replicate Option 3 do both
WHAT WOULD LINNAEUS SAY
Key-Value
httpsenwikipediaorgwikiLinnaean_taxonomy
Graph DB
Document
Columnar (aka BigTable)
Disclaimer heavy overlap
COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups
COLUMNAR EXAMPLES
httpdb-enginescomenranking_trendwide+column+store
KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication
KEY VALUE EXAMPLES
httpdb-enginescomenranking_trendkey-value+store
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
SO WHAT IS NOSQL lsquoUM NON-RELATIONALrsquo No good definitions to be found For me Scales horizontally Foregoes the lsquoold schoolrsquo SQL relations concurrency etc
ldquoexactly like SQL (except where itrsquos not)rdquo Trades-in or reimagines most SQL features for lsquosomething elsersquo
Developer friendlydeveloper driven Schema loose semi-structured Usually Open Source and usually associated with web infrastructure Ignoring older non-relational databases of the past Scales Horizontally (usually) ndash did I mention that Can be lsquogluedrsquo to other data stores
Donrsquot like mine create your own definition )
SIDEBAR OBSERVATION ON SOFTWARE TEAMS
Software teams tied to large central relational database (think 1990s2000s) Large relational database lsquogluersquo teams and apps
together leads to complex databases and dbadmins
VsSoftware teams using no sql Independent except at the edges (inputlogs amp
outputreports)
FOWLERrsquoS IMPEDANCE MISMATCH
Java objects vs rows in tables
What I have called Fowlerrsquos Impedance is mentioned in his and Sadlagersquos book NoSQL Distilled
Most of nosql beasties can store data in more interesting ways
CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory
to act as if they were executing on a single node responding to operations one at a time
Most systems are not (exactly)
A is for Availability ldquoFor a distributed system to be continuously available every request
received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo
I think everyone here understands this one )
P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose
arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo
YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have
the correct answer
Availability The system will always answer you might get your checking balance
from last year instead of todayrsquos balance but you will get an answer
Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems
httpscodahalecomyou-cant-sacrifice-partition-tolerance
ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)
WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block
Handling ldquopartition casesrdquo ie part of the systemnetwork is down
DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent
See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray
HOW TO DISTRIBUTE THE DATA
Option 1 shard Option 2 replicate Option 3 do both
WHAT WOULD LINNAEUS SAY
Key-Value
httpsenwikipediaorgwikiLinnaean_taxonomy
Graph DB
Document
Columnar (aka BigTable)
Disclaimer heavy overlap
COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups
COLUMNAR EXAMPLES
httpdb-enginescomenranking_trendwide+column+store
KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication
KEY VALUE EXAMPLES
httpdb-enginescomenranking_trendkey-value+store
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
SIDEBAR OBSERVATION ON SOFTWARE TEAMS
Software teams tied to large central relational database (think 1990s2000s) Large relational database lsquogluersquo teams and apps
together leads to complex databases and dbadmins
VsSoftware teams using no sql Independent except at the edges (inputlogs amp
outputreports)
FOWLERrsquoS IMPEDANCE MISMATCH
Java objects vs rows in tables
What I have called Fowlerrsquos Impedance is mentioned in his and Sadlagersquos book NoSQL Distilled
Most of nosql beasties can store data in more interesting ways
CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory
to act as if they were executing on a single node responding to operations one at a time
Most systems are not (exactly)
A is for Availability ldquoFor a distributed system to be continuously available every request
received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo
I think everyone here understands this one )
P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose
arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo
YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have
the correct answer
Availability The system will always answer you might get your checking balance
from last year instead of todayrsquos balance but you will get an answer
Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems
httpscodahalecomyou-cant-sacrifice-partition-tolerance
ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)
WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block
Handling ldquopartition casesrdquo ie part of the systemnetwork is down
DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent
See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray
HOW TO DISTRIBUTE THE DATA
Option 1 shard Option 2 replicate Option 3 do both
WHAT WOULD LINNAEUS SAY
Key-Value
httpsenwikipediaorgwikiLinnaean_taxonomy
Graph DB
Document
Columnar (aka BigTable)
Disclaimer heavy overlap
COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups
COLUMNAR EXAMPLES
httpdb-enginescomenranking_trendwide+column+store
KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication
KEY VALUE EXAMPLES
httpdb-enginescomenranking_trendkey-value+store
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
FOWLERrsquoS IMPEDANCE MISMATCH
Java objects vs rows in tables
What I have called Fowlerrsquos Impedance is mentioned in his and Sadlagersquos book NoSQL Distilled
Most of nosql beasties can store data in more interesting ways
CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory
to act as if they were executing on a single node responding to operations one at a time
Most systems are not (exactly)
A is for Availability ldquoFor a distributed system to be continuously available every request
received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo
I think everyone here understands this one )
P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose
arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo
YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have
the correct answer
Availability The system will always answer you might get your checking balance
from last year instead of todayrsquos balance but you will get an answer
Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems
httpscodahalecomyou-cant-sacrifice-partition-tolerance
ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)
WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block
Handling ldquopartition casesrdquo ie part of the systemnetwork is down
DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent
See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray
HOW TO DISTRIBUTE THE DATA
Option 1 shard Option 2 replicate Option 3 do both
WHAT WOULD LINNAEUS SAY
Key-Value
httpsenwikipediaorgwikiLinnaean_taxonomy
Graph DB
Document
Columnar (aka BigTable)
Disclaimer heavy overlap
COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups
COLUMNAR EXAMPLES
httpdb-enginescomenranking_trendwide+column+store
KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication
KEY VALUE EXAMPLES
httpdb-enginescomenranking_trendkey-value+store
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
CAP Here because management loves to chat endlessly about it C is for Consistency ldquoThis is equivalent to requiring requests of the distributed shared memory
to act as if they were executing on a single node responding to operations one at a time
Most systems are not (exactly)
A is for Availability ldquoFor a distributed system to be continuously available every request
received by a non-failing node in the system must result in a response hellipeven when severe network failures occur every request must terminaterdquo
I think everyone here understands this one )
P is for Partition Tolerance ldquoIn order to model partition tolerance the network will be allowed to lose
arbitrarily many messages sent from one node to anotherrdquoQuotes from ldquoBrewerrsquos Conjecture and the Feasibility of Consistent Available Partition-Tolerant Web Servicesrdquo
YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have
the correct answer
Availability The system will always answer you might get your checking balance
from last year instead of todayrsquos balance but you will get an answer
Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems
httpscodahalecomyou-cant-sacrifice-partition-tolerance
ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)
WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block
Handling ldquopartition casesrdquo ie part of the systemnetwork is down
DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent
See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray
HOW TO DISTRIBUTE THE DATA
Option 1 shard Option 2 replicate Option 3 do both
WHAT WOULD LINNAEUS SAY
Key-Value
httpsenwikipediaorgwikiLinnaean_taxonomy
Graph DB
Document
Columnar (aka BigTable)
Disclaimer heavy overlap
COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups
COLUMNAR EXAMPLES
httpdb-enginescomenranking_trendwide+column+store
KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication
KEY VALUE EXAMPLES
httpdb-enginescomenranking_trendkey-value+store
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
YOU CAN HAVE TWO Consistency The system may shutdown or take a day to answer but you will have
the correct answer
Availability The system will always answer you might get your checking balance
from last year instead of todayrsquos balance but you will get an answer
Like asking a research group or asking folks in the pub Canrsquot have both ( One can accept the write not knowing if all the servers are up OR you can refuse until you know all the servers are up Partition Tolerance is mandatory in distributed systems
httpscodahalecomyou-cant-sacrifice-partition-tolerance
ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)
WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block
Handling ldquopartition casesrdquo ie part of the systemnetwork is down
DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent
See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray
HOW TO DISTRIBUTE THE DATA
Option 1 shard Option 2 replicate Option 3 do both
WHAT WOULD LINNAEUS SAY
Key-Value
httpsenwikipediaorgwikiLinnaean_taxonomy
Graph DB
Document
Columnar (aka BigTable)
Disclaimer heavy overlap
COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups
COLUMNAR EXAMPLES
httpdb-enginescomenranking_trendwide+column+store
KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication
KEY VALUE EXAMPLES
httpdb-enginescomenranking_trendkey-value+store
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
ONLY TWO THE FINE PRINT Only two at any moment in time ) For some systems you can choose different pairs for each operation (Cassandra Riak)
WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block
Handling ldquopartition casesrdquo ie part of the systemnetwork is down
DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent
See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray
HOW TO DISTRIBUTE THE DATA
Option 1 shard Option 2 replicate Option 3 do both
WHAT WOULD LINNAEUS SAY
Key-Value
httpsenwikipediaorgwikiLinnaean_taxonomy
Graph DB
Document
Columnar (aka BigTable)
Disclaimer heavy overlap
COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups
COLUMNAR EXAMPLES
httpdb-enginescomenranking_trendwide+column+store
KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication
KEY VALUE EXAMPLES
httpdb-enginescomenranking_trendkey-value+store
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
WHY WOULD ANYONE BE INCONSISTENT Speed while highly concurrent ldquogood now better is than perfect laterrdquo ie donrsquot block
Handling ldquopartition casesrdquo ie part of the systemnetwork is down
DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent
See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray
HOW TO DISTRIBUTE THE DATA
Option 1 shard Option 2 replicate Option 3 do both
WHAT WOULD LINNAEUS SAY
Key-Value
httpsenwikipediaorgwikiLinnaean_taxonomy
Graph DB
Document
Columnar (aka BigTable)
Disclaimer heavy overlap
COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups
COLUMNAR EXAMPLES
httpdb-enginescomenranking_trendwide+column+store
KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication
KEY VALUE EXAMPLES
httpdb-enginescomenranking_trendkey-value+store
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
DB CHEMISTRY ndash MORE BUZZ Is it ACID or BASE Atomicity Consistency Isolation Durability Basically Available Soft-state Eventually consistent
See ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray
HOW TO DISTRIBUTE THE DATA
Option 1 shard Option 2 replicate Option 3 do both
WHAT WOULD LINNAEUS SAY
Key-Value
httpsenwikipediaorgwikiLinnaean_taxonomy
Graph DB
Document
Columnar (aka BigTable)
Disclaimer heavy overlap
COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups
COLUMNAR EXAMPLES
httpdb-enginescomenranking_trendwide+column+store
KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication
KEY VALUE EXAMPLES
httpdb-enginescomenranking_trendkey-value+store
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
HOW TO DISTRIBUTE THE DATA
Option 1 shard Option 2 replicate Option 3 do both
WHAT WOULD LINNAEUS SAY
Key-Value
httpsenwikipediaorgwikiLinnaean_taxonomy
Graph DB
Document
Columnar (aka BigTable)
Disclaimer heavy overlap
COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups
COLUMNAR EXAMPLES
httpdb-enginescomenranking_trendwide+column+store
KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication
KEY VALUE EXAMPLES
httpdb-enginescomenranking_trendkey-value+store
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
WHAT WOULD LINNAEUS SAY
Key-Value
httpsenwikipediaorgwikiLinnaean_taxonomy
Graph DB
Document
Columnar (aka BigTable)
Disclaimer heavy overlap
COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups
COLUMNAR EXAMPLES
httpdb-enginescomenranking_trendwide+column+store
KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication
KEY VALUE EXAMPLES
httpdb-enginescomenranking_trendkey-value+store
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
COLUMNAR STORES Inspired by Googlersquos Bigtable Funky rowcolumn setups
COLUMNAR EXAMPLES
httpdb-enginescomenranking_trendwide+column+store
KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication
KEY VALUE EXAMPLES
httpdb-enginescomenranking_trendkey-value+store
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
COLUMNAR EXAMPLES
httpdb-enginescomenranking_trendwide+column+store
KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication
KEY VALUE EXAMPLES
httpdb-enginescomenranking_trendkey-value+store
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
KEY-VALUE STORES Designed for Speed (even memory-only) High load Global data model of key-values (surprise) Ring partition and replication
KEY VALUE EXAMPLES
httpdb-enginescomenranking_trendkey-value+store
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
KEY VALUE EXAMPLES
httpdb-enginescomenranking_trendkey-value+store
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
DOCUMENT STORES Similar to key-value but the value is a document
Document is stored in json (or similar) Flexible schema Some support keysreferencesindices
ldquodaterdquo[ 2016 04 01] ldquobooktitlerdquo rdquoHhitchhikers guide to the galaxyrdquo ldquoauthorrdquordquoDogulas Adamsrdquo
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
DOCUMENT EXAMPLES
httpdb-enginescomenranking_trenddocument+store
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
GRAPH DATABASES Remember your data structures class in college
Edges and vertices ndash both can hold data
Reduces tough sql queries to simple graph queries
Easier to model ndash lsquomatches the whiteboardrsquo
Relationships between vertices are first class
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
GRAPH DB EXAMPLES
httpdb-enginescomenranking_trenddocument+store
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
HBASE Nosql on top of hadoop
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
SITS ON TOP OF HDFS Name nodes Data nodes Replication And the rest of that whole megillah
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
Column-oriented Handles lsquowidersquo lsquosparsersquo tables well Fault tolerant Supports java REST Avro and Thrift
All operations are atomic at the row level (via write ahead logs)
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
KIND OF SQL Key ndash values Keys are arbitrary strings Values are a entire row of data No joins Apache Phoenix JDBC interface
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
COLUMN FAMILIES Columnrsquos fullname = family name amp column qualifier Each column familyrsquos performance is configured independently
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
REGIONS Looks like shards ndash different key ranges per box no overlap
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
CASSANDRA Tunable nosql
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
CAP WITH QUORUMSKNOB TWEAKING
Symmetric peer to peer Linearly scalable Replication Eventually consistency Partitioning
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
CAP WITH QUORUMSKNOB TWEAKING
Some systems choose per event Three knobs replication amount how many successful writes == lsquoyour writing to the
database is donerdquo how many successful reads out of a full set ==
ldquohere is your datardquo
Higher the values longer the wait
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
GOSSIP AND PEERING Whose up Passing requests Handling missing nodes
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
DATA ColumnFamilies Keys and Values Speed via appending data and timestamps
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
KEY VALUE DYNAMOReplicationRESTProtocol Buffers for queriesTunable consistency
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
RIAK
simple interface high write-availability linear scalingRest api via http ndash put get delete post etcOr Protobufs for quicker serialized datalsquohundreds of nodesrsquo
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
DISTRIBUTED Consistent hashing vector clocks sloppy quorums virtual nodes (not machines but light weight processess - more like having eggs in many baskets ndash easier to give the eggs to folks during a failure) hinted hand off (ldquoplease pass alongrdquo) replicationRequest -gt riak | lt- ask other nodes -gt | | virt node -gt virt node -gt | | data store data store
And then return answers back up the stack
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
SERVERS ldquojust add morerdquo servers Ring architecture ndash all nodes are peers gossip protocols
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
KEYS AND BUCKETS Riak can create them automatically (and return to you the key)
httpSERVERPORTriakBUCKETKEY httpSERVERPORTriakBUCKETKEYkeys=true ^ gets all the keys httpSERVERPORTriakBUCKETKEYkeys=stream ^better for huge sets of data You can store your code in a bucket
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
LINKS Curl blah ndashH ldquolink riakBUCKETKEY riaktag=rdquotagnamerdquo
Link walking ^ can create other structures
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
HOMEWORK AND OTHER READINGSGENERAL
Brewerrsquos conjecture httpswwwcompnusedusg~gilbertpubsBrewersConjecture-SigActpdf
Vogelsrsquo thoughts on eventually Consistent httpwwwallthingsdistributedcom200812eventually_consistenthtml
Old school techniques for ldquoalmost perfectrdquo systems ldquoThe Transaction Concept Virtures and Limitationsrdquo by Jim Gray httpresearchmicrosoftcomen-usumpeoplegraypaperstheTransactionConc
eptpdf
ACID defined Haerder and Reuter Principles of transaction-oriented database recoveryrdquo httpwwwminetuni-jenadedbislehrews2005dbs1HaerderReuter83p
All your base Dan Pritchett ldquoBase An Acid Alternativerdquo httpqueueacmorgdetailcfmid=1394128
NoSQL Distilled by Sadalage and Fowler Seven Databases in Seven Weeks by Redmond and Wilson
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
HOMEWORK AND OTHER READINGS CONTrsquoD
Googlersquos big table httpstaticgoogleusercontentcommediaresearchgooglecomen
archivebigtable-osdi06pdf
Hbase The Definitive Guide by Lars George Hbase in Action by Dimiduk and Kurana Hadoop The Definitive Guide by Tom White
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
HOMEWORK AND OTHER READINGS CONTrsquoD
bull A Little Riak Book by Eric Redmondndash httpwwwlittleriakbookcom
bull Nice video on system details on safari by Justin Sheehyndash httpswwwsafaribooksonlinecomlibrar
yviewriak-core9781449306144part00htmlautoStart=True
bull Riak Handbookndash httpwwwriakhandbookcom
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
READINGS FOR GRAPHS Graph Databases by Robinson Webber and Eifrem Mostly about Neo4j uses Cypher through out
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-
RELATIONAL DATABASE EXAMPLES
httpdb-enginescomenranking_trendrelational+dbms
- nosql
- Irsquom ERic
- Whatrsquos wrong with relational databases
- But first a poor metaphor
- Wait I want more performance
- But you canrsquot move IKEA furniture
- Well we can solve that problem
- The challenge of performance
- So what is NoSQL lsquoum non-relationalrsquo
- Sidebar observation on software teams
- Fowlerrsquos Impedance Mismatch
- CAP
- You can have two
- Only two the fine print
- Why would anyone be inconsistent
- Db chemistry ndash more buzz
- How to distribute the data
- What would Linnaeus say
- Columnar Stores
- Columnar Examples
- Key-Value Stores
- Key Value Examples
- Document Stores
- Document Examples
- Graph databases
- Graph DB Examples
- HBase
- Sits on top of HDFS
- Slide 29
- Kind of SQL
- Column families
- Regions
- Cassandra
- CAP with quorums knob tweaking
- CAP with quorums knob tweaking (2)
- Gossip and peering
- Data
- Key value dynamo
- Riak
- Distributed
- Servers
- Keys and buckets
- links
- Homework and other readings General
- Homework and other readings contrsquod
- Homework and other readings contrsquod
- Readings for graphs
- Relational database examples
-