understanding, choosing & instrumenting nosql
DESCRIPTION
KeyTRANSCRIPT
Understanding Choosing& INSTRUMENTING NOSQL
A!PRESENTATION!BY @timangladEFROM!YOUR!FRIENDS!AT @cloudant
@timanglade
/taɪn/A database engine so tiny,its name had to be shortened
Tin
NOSQLTAPES.com@NOSQLTAPES
You havea scaling problem
So you decide togive NOSQL a try
Now you HAveTWO Problems
That’s the realityof NOSQL TODAY
So why bother? Oh, Don’t worry, I’m going to tell you…
Why & HOWdo I NOSQL?
Understanding Choosing& INSTRUMENTING NOSQL
! UNDERSTANDING Your problem & NOSQL
NOSQL is aboutTwo things
1. Performance 2. Distribution
N.B. To varying degrees in each NOSQL project…
Performance?
PerformancePerformance/$
LOOK AT YOURRDBMS FIRST…
A-B-C
Tip #1Cost: close to $0…
A Always
B Be
C CACHING
AlwaysBe
CACHING
Always
BE
CACHING
If you don’tI’ve NO SYMPATHYFOR YOU, PALAlso, always be rewatching GlenGarry Glen Ross.Great Movie. Fantastic Cast.
INDEXFor God’s SAKE, INDEX
Tip #2Still For close to $0…
If You know sqlYou can nosql
Tip #3Did you go to school? Then yes, Still close to zero
Hirea DatabaseConsultant
Tip #4For a Fistful of dollars…
Buy a bigger box
Tip #5For a few dollars more…
5.1 TB 1.2 MIOPS 100 K Dollars
A word about“the Cloud”
Tip #6Say, That’s a really nice shirt you’re wearing thereI’m sure we could come to an arrangement…
You have a scaling problem,So you decide togive the Cloud a try
Now you HAveX ProblemsWhere X is arbitrarily large
The greatest sandboxever made
The BEST Bootcampever designed
BTWTHESE TIPS APPLYTO NOSQL TOO…
Did all that?don’t have the $$$?
Welcome TO NOSQL
NOSQL lets youUSE skillsINSTEAD of $$$But do you have the skills? Can you get them?
Distribution
What if you Can’tBUY a BOX ANY BIGGER?
NETWORK PARTITIONSHAPPEN
BROWNoutsHAPPEN
Distribution is a(mostly) efficient wayTO ADD moreCAPACITY & AvailabilityTo youR DB
A Word aboutMASTER-SLAVEREPLICATION
A WORD ABOUTTHE CAP THEOREM
! CHOOSING A NOSQL DATABASE
How do I chose?
Same 2 parametersDistribution+ “Performance”
What doesPerformanceMean?
Performance! Data / Query Model! Disk Structure
the Moon Methodology™
1. Distribution 2. Data / Query Model 3. Disk Structure
DistributionDynamo-StyleMaster-slaveMASTER-MASTER
Data / Query ModelMap/ReduceEverywhere?
Disk StructuresThe Devil is in the(implementation)Details
Now let’s lookat major NOSQL DBsthrough this lens…
CouchdbMaster-masterDoc + Persistent M/RAppend-only B+ Tree
SMALL SCALEQUERIES don’t CHANGEHTTP IS A MUST
Ideal SCENARIO
CouchBase 2.0Master-slaveK/V + Persistent M/RAppend-only B+ Tree
?
IDEAL SCENARIO
BigCouchDynamoDoc + Persistent M/RAppend-only B+ Tree
Same as COUCHDB,BIGGER SCALE
IDEAL SCENARIO
CassandraDynamoColumn FamiliesLog + SSTable
fast writes +Don’t mind hacking
ideal scenario
RiakMulti-DC DynamoK/V + M/R + 2aryLog-Struct. Hash Table(& others)
Knowledgeable team,really large scale
ideal scenario
MongoDBMaster-slaveDocS + M/R + 2aryLog + B-Tree
Prototyping
ideal scenario
RedisMaster-slave (?)ManyLOG + Many (?)
APPLICATION “GLUE”
ideal scenario
please don’t sniffyour REDIS
A plea for mercy
NEO4JMaster-slave (~)OO + RESTCustom Graph Struct.
LOTS OF SELF-JOINS
ideal scenario
! INSTRUMENTING YOUR STACK
You WILL FAIL
Oh yes, Yes you will…
Any advancedDistributed SystemWill behaveLIKE a BLACK BOX
TESTING IS FINE butMEASURING IS MOREUSEFULIn this context
PercentileRESPONSE TIMES
Some stuff to keep an eye on
ERROR RATES
Some stuff to keep an eye on
Memory usage& stack depth
Some stuff to keep an eye on
CPU usage& Number of Processes
Some stuff to keep an eye on
DISK USAGE& IOPS
Some stuff to keep an eye on
Hawk the graphsOVER LONG PERIODS
Instrumentation& METROLOGYare still DARK ARTS
Next Steps
Coda Hale’smetrics EVERYWHEREpivotallabs.com/talks/139-metrics-metrics-everywhere
Find a monitoringsystem that worksFOR YOU
Recap!!1.! NOSQL IS HARD!2.! KNOW YOUR RDBMS, KNOW YOUR PROBLEM!3.! PICK A DB By DISTRIBUTION, Query & Disk Models!4.! Instrument the heck out of it!5.! Rinse!6.! REPEAT
GoEXPERIMENTDeployMEASUREIMPROVEHAVE FUN