agility and scalability with mongodb
DESCRIPTION
MongoDB has taken a clear lead in adoption among the new generation of databases, including the enormous variety of NoSQL offerings. A key reason for this lead has been a unique combination of agility and scalability. Agility provides business units with a quick start and flexibility to maintain development velocity, despite changing data and requirements. Scalability maintains that flexibility while providing fast, interactive performance as data volume and usage increase. We'll address the key organizational, operational, and engineering considerations to ensure that agility and scalability stay aligned at increasing scale, from small development instances to web-scale applications. We will also survey some key examples of highly-scaled customer applications of MongoDB.TRANSCRIPT
![Page 2: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/2.jpg)
2
• Now
• Secure
• All varieties
• Fast and interactive
• Scalable to “Big”
• Agile to develop and deploy operationally
• Cloud and edge
Data Challenge“I want my data...”
iStock licensed (pixelfit)
![Page 3: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/3.jpg)
3
Scalability with MongoDB
Metric Meaning Examples
Operations per Second
Concurrent reads and writes per second
> 1 Million per second
Nodes per Cluster
Horizontal scale-out, distributed to multiple data centers worldwide, with high availability, using inexpensive cloud resources
> 1000 nodes
Records / Documents
Data objects in any number of schemas or structures
> 10 billion
Data Volume Total amount of data: documents X size
> 1 Petabyte = 10^15 = 1,000,000,000,000,000≈ 2^50
![Page 4: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/4.jpg)
Key Differentiation
![Page 5: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/5.jpg)
5
Operational Database Landscape
![Page 6: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/6.jpg)
6
Document Data Model
Relational MongoDB
{ first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } }}
![Page 7: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/7.jpg)
7
Documents are Rich Data Structures
{ first_name: ‘Paul’, surname: ‘Miller’, cell: ‘+447557505611’ city: ‘London’, location: [45.123,47.232], Profession: [banking, finance, trader], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } }}
Fields can contain an array of sub-documents
Fields
Typed field values
Fields can contain arrays
String
Number
Geo-
Coordinate
s
![Page 8: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/8.jpg)
8
Document Model Benefits
• Agility and flexibility– Data model supports business change– Rapidly iterate to meet new requirements
• Intuitive, natural data representation– Eliminates ORM layer– Developers are more productive
• Reduces the need for joins, disk seeks– Programming is more simple– Performance delivered at scale
![Page 9: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/9.jpg)
11
Big Data Tech Interest Comparison
j.mp/Ssvpev
![Page 11: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/11.jpg)
Architecture for Availability & Scalability
![Page 12: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/12.jpg)
14
Replica Sets
• Replica Set – two or more copies
• Availability solution– High Availability
– Disaster Recovery
– Maintenance
• Deployment Flexibility– Data locality to users
– Workload isolation: operational & analytics
• Self-healing shard
Primary
Driver
Application
Secondary
Secondary
Replication
![Page 13: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/13.jpg)
16
Global Data Distribution
Real-time
Real-time Real-time
Real-time
Real-time
Real-time
Real-time
Primary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
![Page 14: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/14.jpg)
17
Automatic Sharding
• Sharding types
• Range
• Hash
• Tag-aware
• Elastic increase or decrease in capacity
• Automatic balancing
![Page 15: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/15.jpg)
18
Query Routing
• Multiple query optimization models
• Each sharding option appropriate for different apps
![Page 16: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/16.jpg)
Performance
![Page 17: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/17.jpg)
20
Drag Strip: straight ahead, quarter-mile, stop
![Page 18: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/18.jpg)
21
Road Race:stay fast, stay agile, continuous
Nürburgring, Germany
![Page 19: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/19.jpg)
MongoDB at Scale
![Page 20: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/20.jpg)
24
• Large data set
CarFax
![Page 21: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/21.jpg)
25
Baseline MongoDB Comparison Initial Production
• Vehicle History Database
• 11 billion records (growing at 1 billion per year)
• 30-year-old VMS-based RDBMS
• Cumbersome
• Costly
• Performance: 4x faster than baseline, 10x key-value
• Scale out using inexpensive commodity servers
• Built-in redundancy
• Flexible dynamic schema data model
• Strong consistency
• Analytics/aggregation
• MongoDB is primary data store
• 50 servers• 10 shards• 5 node replica sets per
shard
In-depth NoSQL evaluation
![Page 22: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/22.jpg)
26
• 13 billion+ documents– 1.5 billion documents added every year
• 1 vehicle history report is > 200 documents
• 12 Shards
• 9-node replica sets
• Replicas distributed across 3 data centers
CARFAX Sharding and Replication
![Page 23: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/23.jpg)
27
CARFAX Replication
![Page 24: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/24.jpg)
28
![Page 25: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/25.jpg)
29
• 50M users.
• 6B check-ins to date (6M per day growth).
• 55M points of interest / venues.
• 1.7M merchants using the platform for marketing
• Operations Per Second: 300,000
• Documents: 5.5B (~16.5B with replication).*
Foursquare
![Page 26: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/26.jpg)
30
• 11 MongoDB clusters– 8 are sharded
• Largest cluster for check-ins
• 15 shards (check ins)
• Shard key user_id
Foursquare clusters
![Page 27: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/27.jpg)
31
Facebook / parse.com mobile apps
• Persistent database for 270,000 mobile applications
• 200 M end-user mobile devices
• 250% annual growth in client apps
• 500% growth in requests
• 1.5 M collections
• Key differentiators:
– Document data model
– High perf. & avail.
– Geospatial query and index
• Charity Majors operations: j.mp/X3jVRC
– Understand your database and your data, and build for them.
![Page 28: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/28.jpg)
![Page 29: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/29.jpg)
Scalability Exercises in the Cloud with Amazon Web Services
![Page 30: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/30.jpg)
35
• 27x hs1.8xlarge instances
– 16x VCPU
– 24x 2TB SATA drives, RAID0
– 8x mongod microshards
• Modified Yahoo Cloud Serving Benchmark (YCSB)
– Long Integer IDs (>2B)
– Zipfian-distributed integer fields
– Aggregation queries
• Load direct to 216 shards, 10 days, $4K "objects" : 7,170,648,489, "avgObjSize" : 147,438.99952658816, "dataSize" : NumberLong("1,057,240,224,818,640") (commas added)
Petascale Database
![Page 31: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/31.jpg)
CGroup Memory Segregation
for DB in `seq 0 3`; do sudo cgcreate \ -a mongodb:mongodb \ -t mongodb:mongodb \ -g memory:mongodb$D sudo echo 48G > \ /sys/fs/cgroup/memory/mongodb$D/memory.limit_in_bytes cgexec \ -g memory:mongodb$DB \ numactl –interleave=all \ mongod –-config ~/mongod$DB.confdone
![Page 32: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/32.jpg)
37
• Ingest 250-byte stock quotes at 2M/s
• Concurrently run 5 QPS, subsecond/indexed response on timeStamp, accountId, instrumentId, systemKey
• 5x r3.4xlarge– 16x VCPU, 1x 320GB SSD, 122GB RAM, 16x mongod
– 2.1M insert/second direct to shards
• 16x c3.8xlarge– 32x VCPU, 2x 320GB SSD, 60GB RAM, 16x mongod, 4x mongos
– 2.1M insert/second via mongos
Megawrite Ingest
![Page 33: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/33.jpg)
38
• 2 threads on c3.8xl
• 264 bsonsize object, _id index only
• coll.insert() 15,600 ins / sec
• coll.insert(List<DBObject>)listsize = 64: 118,000 ins / sec
• Bulk ops APIsize = 64: 120,000 ins / sec
Java API comparison
![Page 34: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/34.jpg)
BulkWriteOperation bo = null; for(a = 0; a < this.items && stayAlive; a++) { if(bo == null) { bo = collection.initializeUnorderedBulkOperation(); } fillMap(this.m); BasicDBObject dbObject = new BasicDBObject(this.m); bo.insert(dbObject); if(0 == a % listsize) { BulkWriteResult rc = bo.execute(); bo = null; }}
7x Load with BulkOp
![Page 35: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/35.jpg)
How do I Pick A Shard Key?
![Page 36: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/36.jpg)
41
Shard Key characteristics
• A good shard key has:– sufficient cardinality
– distributed writes
– targeted reads ("query isolation")
• Shard key should be in every query if possible– scatter gather otherwise
• Choosing a good shard key is important!– affects performance and scalability
– changing it later is expensive
![Page 37: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/37.jpg)
42
Hashed shard key
• Pros:– Evenly distributed writes
• Cons:– Random data (and index) updates can be IO intensive
– Range-based queries turn into scatter gather
Shard 1
mongos
Shard 2 Shard 3 Shard N
![Page 38: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/38.jpg)
43
Low cardinality shard key
• Induces "jumbo chunks"
• Examples: boolean field
Shard 1
mongos
Shard 2 Shard 3 Shard N
[ a, b )
![Page 39: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/39.jpg)
44
Ascending shard key
• Monotonically increasing shard key values cause "hot spots" on inserts
• Examples: timestamps, _id
Shard 1
mongos
Shard 2 Shard 3 Shard N
[ ISODate(…), $maxKey )
![Page 40: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/40.jpg)
Ensuring Success with High Scalability
![Page 41: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/41.jpg)
46
Success Factors
• Storage: random seeks (IOPS)
• RAM: working set based on query patterns
• Query: indexing
• Delete: most expensive operation
• Real-time vs. bulk operations
• Continuity: HA, DR, backup, restore
• Agile process: iterate by powers of 4
• Sharding: shard key and strategy
• Resources: don’t go it alone!