databases and queries: matching performance and reliability

D ATA B A S E S A N D Q U E R I E SM A T C H I N G P E R F O R M A N C E A N D R E L I A B I L I T Y

Dave SmithVP, Engineering

@dizzyd

S U R V E Y

• Who has hit problems scaling RDBMS?

• Who is using non-relational databases?

Q U E R I E S

• Relational

• Key/value (document)

• Text retrieval (full-text search)

• Graph

• Time-series

• Geospatial

Q U E R I E S ( C O N T. )

• What questions are you asking of your data?

• Get a record by a key

• Find records based on a relationship

• Find all documents with a given term

• Apply operation to metrics within a timeframe

It is possible to rewrite most queries in other forms.

P E R F O R M A N C E

• Access patterns

• Read/write mix

• Sequential vs. Pareto vs. uniformly random

• Throughput - how many requests/sec?

• Latency - how long does it take to service a single request?

• Always a distribution! Mean is meaningless…

• Data size

• Total size of dataset

• Size per item in dataset

R E L I A B I L I T Y

• How can databases fail?

• Disks -> integrity checking

• Nodes -> replication

• Network -> versioning

• Software -> (all of above)

• Overload -> elasticity

• Key questions

• How well does the system tolerate failure?

• How well does the system deal with unexpected load?

It can be impossible to distinguish between a slow node and a failed node.

U G LY T R U T H S

• All databases require tuning

• Failure is hard to test — most people don’t bother

• Networks fail — especially under high load

• The more your database does, the more ways it can fail

• More code == more bugs

C H O I C E S , C H O I C E S …

• MySQL, Postgres, Oracle

• CouchDB, MongoDB, RethinkDB

• Riak, Cassandra

• HBase, Hypertable

• MemSQL, CouchBase

• ElasticSearch, SOLR

• Neo4J, Titan

databases and queries: matching performance and reliability

Documents