when not to use mongodb

18
When NOT to use MongoDB

Upload: mike-michaud

Post on 08-Feb-2017

735 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: When NOT to use MongoDB

When NOT to use MongoDB

Page 2: When NOT to use MongoDB

MongoDB (NoSQL)

• What is it?

• How Does it Work?

• Where did it come from?

Page 3: When NOT to use MongoDB

MongoDB (NoSQL): What is it?

Tables of data with defined fields,

related by foreign keys and join tables.

Collections of JSON-like documents which can

have any data structure at all.

Page 4: When NOT to use MongoDB

MongoDB (NoSQL): How does it work?

Relationships• NO Join queries. One doc is MUCH faster. Several

related docs can be MUCH slower and probably more complicated.

• Linking through storing an array of related doc ids.

• No querying embedded documents, have to pull em all. (How many posts has Joe ‘Liked’?)

• Either denormalization or link id arrays or both can quickly get out of control.

Page 5: When NOT to use MongoDB

MongoDB (NoSQL): How does it work?

CAP TheoremC: Consistency ~ All Clients always have the same view of the data.

A: Availability ~ All clients can always read and write.

P: Partition Tolerance ~ The system can survive and function while parts are separated, for the sake of scaling or by failures of servers or the network.

(A Little Background)

Page 6: When NOT to use MongoDB

MongoDB (NoSQL): How does it work?

Define ‘Partition’• Some separation of members of the system.

• DB Partition: Vertical or horizontal (sharding) division of the database on to separate servers for the sake of scaling

• Fault Partition: A failure (of one or more) of the database servers, or of the network between them, or of the connection to the client.

• All require a reconciliation, either ongoing, or when the system is restored, called ‘recovery’.

(A Little Background)

Page 7: When NOT to use MongoDB

MongoDB (NoSQL): How does it work?

Strong Consistency, at the expense of:

• Availability: If one of the database shards goes down, or the network between them goes down, or we just need to synch up, everything stops. We MUST make sure everyone will see the same thing. Client(s) must wait.

• Partition Tolerance: It is very difficult, or impossible to separate the database. One machine, one socket, one client at a time. Its the only way to be sure.

(A Little Background)

Page 8: When NOT to use MongoDB

MongoDB (NoSQL): How does it work?

Strong Availability, at the expense of:

• Consistency: If one of the database shards or replicas goes down or becomes unreachable, we change or serve what we can for the given client.

• Partition tolerance: Because we changed or served what we could, given what we knew with part of the db down, we may not be able to restore a complete and correct DB state once things link back up, but we don’t really care.

(A Little Background)

Page 9: When NOT to use MongoDB

MongoDB (NoSQL): How does it work?

Strong Partition Tolerance, at the expense of:

• Consistency: We abandon the requirement of consistent representation of data to all clients in favor of doing the best we can while stuff is on separate servers, or while something is broken.

• Availability: Our focus is on making a system that can be split up on to many servers, if we require consistency as well, then availability must suffer since all server shards MUST be in sync before we serve anything. Clients must wait.

(A Little Background)

Page 10: When NOT to use MongoDB

MongoDB (NoSQL): How does it work?

CAP Theorem(A Little Background)

Page 11: When NOT to use MongoDB

A: Atomicity ~ Transactions are all or nothing. If anything fails at any point, no change is made.

C: Consistency ~ Database will NEVER exist in an invalid state. It will transition from valid state to valid state only.

I: Isolation ~ Database will maintain a state which represents all transactions having occurred in isolation and serially (threadsafety).

D: Durability ~ All transactions, once committed, will survive failures, crashes, power outages, etc…

DB Stoichiometry

ACID vs. BASE

Page 12: When NOT to use MongoDB

B A: Basic Availability ~ Focuses on availability despite multiple system failures, at the expense of consistency.

S: Soft State ~ DB Abdicates the responsibility of maintaining ‘valid’ data state. Control of data structure is relegated to a higher level in the stack.

E: Eventually Consistent ~ If all goes well, all clients on all partitions will eventually be seeing the same thing. Probably.

DB Stoichiometry

ACID vs. BASE

Page 13: When NOT to use MongoDB

MongoDB (NoSQL): How does it work?The Nitty Gritty

• No Multi-Document transactions. (two bank accounts)

• Queries are processed in RAM, journaled usually within 100ms (default), and written to the db files on disk for replication within a minute or so (usually).

• NOT irresponsible, just differently optimized. Misusing it is irresponsible.

Page 14: When NOT to use MongoDB

MongoDB (NoSQL): Where did it come from?

• Indexing the interwebs

Page 15: When NOT to use MongoDB

• Extraordinarily easy (nearly automatic) to scale up to massive database clusters without losing much performance

• Unstructured data provides HUGE flexibility at every stage of an application's lifecycle

• Lots of useful, built in data aggregation tools

• Everyone else seems to love it, how can everyone else be wrong?

Why we love MongoDB:

Page 16: When NOT to use MongoDB

Important weaknesses of MongoDB:

• NO Multi-Document transactions. Period.

• NO DB level joins. Period. (No relating embedded records without manual tricks)

• EVENTUAL Durability. Probably. (Caching nightmares, risks increase with increased collection interrelationships)

• Unstructured data is awesome. And Terrifying. “With great power…”

Page 17: When NOT to use MongoDB

• You know your data will be highly relational and you can foresee situations where you will want to run novel, varied, highly interrelated queries any more than occasionally.

• You will have multi-doc transactions that MUST be all or nothing

• You will be storing data that MUST NOT get lost or corrupted.

• It is imperative that all clients see the same thing when they query the DB or DB clusters.

• You are concerned about the long term cost of managing all of the responsibilities and risks of storing data without a schema.

• Remember that you can use MongoDB for less critical data, and sql for the sensitive stuff.

Bottom LineDo NOT use MongoDB if: