databases and queries: matching performance and reliability
TRANSCRIPT
D ATA B A S E S A N D Q U E R I E SM A T C H I N G P E R F O R M A N C E A N D R E L I A B I L I T Y
Dave SmithVP, Engineering
@dizzyd
S U R V E Y
• Who has hit problems scaling RDBMS?
• Who is using non-relational databases?
Q U E R I E S
• Relational
• Key/value (document)
• Text retrieval (full-text search)
• Graph
• Time-series
• Geospatial
Q U E R I E S ( C O N T. )
• What questions are you asking of your data?
• Get a record by a key
• Find records based on a relationship
• Find all documents with a given term
• Apply operation to metrics within a timeframe
It is possible to rewrite most queries in other forms.
P E R F O R M A N C E
• Access patterns
• Read/write mix
• Sequential vs. Pareto vs. uniformly random
• Throughput - how many requests/sec?
• Latency - how long does it take to service a single request?
• Always a distribution! Mean is meaningless…
• Data size
• Total size of dataset
• Size per item in dataset
R E L I A B I L I T Y
• How can databases fail?
• Disks -> integrity checking
• Nodes -> replication
• Network -> versioning
• Software -> (all of above)
• Overload -> elasticity
• Key questions
• How well does the system tolerate failure?
• How well does the system deal with unexpected load?
It can be impossible to distinguish between a slow node and a failed node.
U G LY T R U T H S
• All databases require tuning
• Failure is hard to test — most people don’t bother
• Networks fail — especially under high load
• The more your database does, the more ways it can fail
• More code == more bugs
C H O I C E S , C H O I C E S …
• MySQL, Postgres, Oracle
• CouchDB, MongoDB, RethinkDB
• Riak, Cassandra
• HBase, Hypertable
• MemSQL, CouchBase
• ElasticSearch, SOLR
• Neo4J, Titan