SCHEMA ON READ
Index everything One query type Low latency High concurrency
Index nothing Queries as programs High latency Low concurrency
Index everything One query type Low latency High concurrency
Index nothing Queries as programs High latency Low concurrency
IT’S POPULAR, BUT WHY?
7
Diverse operational workloads are common
Top 5 Marketing Firm Government Agency Top 5 Investment Bank
Data Key / Value 10+ fields, arrays, nested documents 20+ fields, arrays, nested documents
Queries Key – based
1-100 docs / query 80/20 read/write
Compound queries Range queries
MapReduce 20/80 read/write
Compound queries Range queries
50/50 read/write
Servers ~250 ~50 4
Ops / Sec 1,200,000 500,000 30,000
8
Some deployments are large
Cluster Scale Performance Scale Data Scale
Entertainment Company 1,400 servers 250 Million Ticks / Sec Petabytes
Asian Internet Company 1,000+ servers 300k Ops / Sec 10s of billions of
objects
250+ servers Federal Agency 500k Ops / Sec 13 billion documents
9
Multiple indicators suggest adoption is strong
RANK DBMS MODEL SCORE GROWTH (20 MO)
1. Oracle Relational DBMS 1,442 -5%
2. MySQL Relational DBMS 1,294 2%
3. Microsoft SQL Server Relational DBMS 1,131 -10%
4. MongoDB Document Store 277 172%
5. PostgreSQL Relational DBMS 273 40%
6. DB2 Relational DBMS 201 11%
7. Microsoft Access Relational DBMS 146 -26%
8. Cassandra Wide Column 107 87%
9. SQLite Relational DBMS 105 19%
Source: DB-engines database popularity rankings; May 2015
Source: Stack Overflow via Stackoverkill.com
Source: Stack Overflow via Stackoverkill.com
TO ME, THREE THINGS DRIVE THIS ADOPTION
13
We asked users why, here’s what they told us
{ CODE } DB SCHEMA XML CONFIG
APPLICATION RELATIONAL DATABASE OBJECT RELATIONAL MAPPING
14
We asked users why, here’s what they told us
{ CODE } DB SCHEMA XML CONFIG
APPLICATION RELATIONAL DATABASE OBJECT RELATIONAL MAPPING
15
RDBMS MongoDB
Database Database
Table Collection
Index Index
Row Document
Join Embedding & Linking
#1 The data model
16
Documents are rich data structures
{ first_name: ‘Paul’, surname: ‘Miller’, cell: 447557505611, city: ‘London’, location: [45.123,47.232], Profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000}, { model: ‘Rolls Royce’, year: 1965, value: 330000} ]
}
Fields can contain an array of sub-documents
Typed field values
Fields can contain arrays
String
Number
Geo-Location
Fields
17
Documents are self-describing
{ product_name: ‘Acme Paint’, color: [‘Red’, ‘Green’],
size_oz: [8, 32], finish: [‘satin’, ‘eggshell’]
}
{ product_name: ‘T-shirt’, size: [‘S’, ‘M’, ‘L’, ‘XL’], color: [‘Heather Gray’ … ],
material: ‘100% cotton’, wash: ‘cold’, dry: ‘tumble dry low’
}
{ product_name: ‘Mountain Bike’, brake_style: ‘mechanical disc’, color: ‘grey’,
frame_material: ‘aluminum’, no_speeds: 21, package_height: ‘7.5x32.9x55’,
weight_lbs: 44.05, suspension_type: ‘dual’, wheel_size_in: 26}
Documents in the same product catalog collection in MongoDB
18
#2 Idiomatic drivers & frameworks
Morphia
MEAN Stack
// Java: mapsDBObject query = new BasicDBObject(”publisher.founded”, 1980));Map m = collection.findOne(query);Date pubDate = (Date)m.get(”published_date”);
// Javascript: objectsm = collection.findOne({”publisher.founded” : 1980});pubDate = m.published_date; // ISODateyear = pubDate.getUTCFullYear();
# Python: dictionariesm = coll.find_one({”publisher.founded” : 1980 });pubDate = m[”pubDate”].year # datetime.datetime
Documents map to language constructs
20
#3 It’s easy…and fun
• Easy to acquire – AGPL license • Easy to install and configure – up and running in <5 min • Easy to get high performance – no black magic for millisecond latency, scale out architecture • Easy to deliver “always on” – replication and automatic failover built in • Easy to add, query data – no complex modeling, no DDL
21
#3 It’s easy…and fun
• Easy to acquire – AGPL license • Easy to install and configure – up and running in <5 min • Easy to get high performance – no black magic for millisecond latency, scale out architecture • Easy to deliver “always on” – replication and automatic failover built in • Easy to add, query data – no complex modeling, no DDL
BUT WHAT ABOUT • Data governance? • Referential integrity? • Analytics?
DOCUMENT VALIDATION
23
Data governance: document validation
Implement data governance without sacrificing the
agility that comes from schema on read
24
Document validation gives you flexible control
• Use familiar MongoDB Query Language • Automatically tests each insert/update; delivers warning or error if a rule is broken • You choose what keys to validate and how
db.runCommand({ collMod: "contacts", validator: { $and: [ {year_of_birth: {$lte: 1994}}, {$or: [ {phone: { $type: ”string"}}, {email: { $type: ”string"}} ]}] }})
25
Example validation failure
db.contacts.insert( name: "Fred", email: "[email protected]", year_of_birth: 2012})
Document failed validationWriteResult({ "nInserted": 0, "writeError": { "code": 121, "errmsg": "Document failed validation”}})
26
Many ways to validate, no foreign keys yet
• Can check most things that work with a find expression – Existence – Non-existence – Data type of values – <, <=, >, >=, ==, != – AND, OR – Regular expressions
– Some geospatial operators (e.g. $geoWithin & $geoIntersects) • Validate existing data by wrapping expression in $not
27
Where MongoDB validation excels (vs. RDBMS)
• Simple – Use familiar search expressions (MQL) – No need for stored procedures
• Flexible – Only enforced on mandatory parts of the schema – Can start adding new data at any point and then add validation later if needed
• Practical to deploy – Simple to role out new rules across thousands of production servers
• Light weight – Negligible impact to performance
28
Controlling validation
validationLevel
off moderate strict
validationAction
warn
No checks
Warn on validation failure for inserts & updates to existing valid documents. Updates to
existing invalid docs OK.
Warn on any validation failure for any insert or update.
error
No checks
Reject invalid inserts & updates to existing valid documents.
Updates to existing invalid docs OK.
Reject any violation of validation rules for any insert or update.
DEFAULT
29
Versioning of validators (optional)
• Application can lazily update documents with an older version or with no version set at all
db.runCommand({ collMod: "contacts", validator: {$or: [{version: {"$exists": false}}, {version: 1, {Name: {"$exists": true}} }, {version: 2, {Name: {"$type": ”string"}} } ] } })
SCHEMA DISCOVERY
FUTURE DECISIONS
33
Still lots of hard problems to solve
• Schema evolution • Specialized storage engines
– WORM – Blockchain – Proprietary hardware – Integrated data warehouse
• Complex transactions
34
One surface fits all
Content Repo IoT Sensor Backend Ad Service Customer
Analytics Archive
MongoDB Query Language (MQL) + Native Drivers
MongoDB Document Data Model
BTree LSM
Man
agem
ent
Sec
urity
In-memory WORM Archive