Download - Agility and Scalability with MongoDB
![Page 2: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/2.jpg)
2
• Now
• Secure
• All varieties
• Fast and interactive
• Scalable to “Big”
• Agile to develop and deploy operationally
• Cloud and edge
Data Challenge“I want my data...”
iStock licensed (pixelfit)
![Page 3: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/3.jpg)
3
Scalability with MongoDB
Metric Meaning Examples
Operations per Second
Concurrent reads and writes per second
> 1 Million per second
Nodes per Cluster
Horizontal scale-out, distributed to multiple data centers worldwide, with high availability, using inexpensive cloud resources
> 1000 nodes
Records / Documents
Data objects in any number of schemas or structures
> 10 billion
Data Volume Total amount of data: documents X size
> 1 Petabyte = 10^15 = 1,000,000,000,000,000≈ 2^50
![Page 4: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/4.jpg)
Key Differentiation
![Page 5: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/5.jpg)
5
Operational Database Landscape
![Page 6: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/6.jpg)
6
Document Data Model
Relational MongoDB
{ first_name: ‘Paul’, surname: ‘Miller’, city: ‘London’, location: [45.123,47.232], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } }}
![Page 7: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/7.jpg)
7
Documents are Rich Data Structures
{ first_name: ‘Paul’, surname: ‘Miller’, cell: ‘+447557505611’ city: ‘London’, location: [45.123,47.232], Profession: [banking, finance, trader], cars: [ { model: ‘Bentley’, year: 1973, value: 100000, … }, { model: ‘Rolls Royce’, year: 1965, value: 330000, … } }}
Fields can contain an array of sub-documents
Fields
Typed field values
Fields can contain arrays
String
Number
Geo-
Coordinate
s
![Page 8: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/8.jpg)
8
Document Model Benefits
• Agility and flexibility– Data model supports business change– Rapidly iterate to meet new requirements
• Intuitive, natural data representation– Eliminates ORM layer– Developers are more productive
• Reduces the need for joins, disk seeks– Programming is more simple– Performance delivered at scale
![Page 9: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/9.jpg)
11
Big Data Tech Interest Comparison
j.mp/Ssvpev
![Page 11: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/11.jpg)
Architecture for Availability & Scalability
![Page 12: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/12.jpg)
14
Replica Sets
• Replica Set – two or more copies
• Availability solution– High Availability
– Disaster Recovery
– Maintenance
• Deployment Flexibility– Data locality to users
– Workload isolation: operational & analytics
• Self-healing shard
Primary
Driver
Application
Secondary
Secondary
Replication
![Page 13: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/13.jpg)
16
Global Data Distribution
Real-time
Real-time Real-time
Real-time
Real-time
Real-time
Real-time
Primary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
Secondary
![Page 14: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/14.jpg)
17
Automatic Sharding
• Sharding types
• Range
• Hash
• Tag-aware
• Elastic increase or decrease in capacity
• Automatic balancing
![Page 15: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/15.jpg)
18
Query Routing
• Multiple query optimization models
• Each sharding option appropriate for different apps
![Page 16: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/16.jpg)
Performance
![Page 17: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/17.jpg)
20
Drag Strip: straight ahead, quarter-mile, stop
![Page 18: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/18.jpg)
21
Road Race:stay fast, stay agile, continuous
Nürburgring, Germany
![Page 19: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/19.jpg)
MongoDB at Scale
![Page 20: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/20.jpg)
24
• Large data set
CarFax
![Page 21: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/21.jpg)
25
Baseline MongoDB Comparison Initial Production
• Vehicle History Database
• 11 billion records (growing at 1 billion per year)
• 30-year-old VMS-based RDBMS
• Cumbersome
• Costly
• Performance: 4x faster than baseline, 10x key-value
• Scale out using inexpensive commodity servers
• Built-in redundancy
• Flexible dynamic schema data model
• Strong consistency
• Analytics/aggregation
• MongoDB is primary data store
• 50 servers• 10 shards• 5 node replica sets per
shard
In-depth NoSQL evaluation
![Page 22: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/22.jpg)
26
• 13 billion+ documents– 1.5 billion documents added every year
• 1 vehicle history report is > 200 documents
• 12 Shards
• 9-node replica sets
• Replicas distributed across 3 data centers
CARFAX Sharding and Replication
![Page 23: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/23.jpg)
27
CARFAX Replication
![Page 24: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/24.jpg)
28
![Page 25: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/25.jpg)
29
• 50M users.
• 6B check-ins to date (6M per day growth).
• 55M points of interest / venues.
• 1.7M merchants using the platform for marketing
• Operations Per Second: 300,000
• Documents: 5.5B (~16.5B with replication).*
Foursquare
![Page 26: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/26.jpg)
30
• 11 MongoDB clusters– 8 are sharded
• Largest cluster for check-ins
• 15 shards (check ins)
• Shard key user_id
Foursquare clusters
![Page 27: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/27.jpg)
31
Facebook / parse.com mobile apps
• Persistent database for 270,000 mobile applications
• 200 M end-user mobile devices
• 250% annual growth in client apps
• 500% growth in requests
• 1.5 M collections
• Key differentiators:
– Document data model
– High perf. & avail.
– Geospatial query and index
• Charity Majors operations: j.mp/X3jVRC
– Understand your database and your data, and build for them.
![Page 28: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/28.jpg)
![Page 29: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/29.jpg)
Scalability Exercises in the Cloud with Amazon Web Services
![Page 30: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/30.jpg)
35
• 27x hs1.8xlarge instances
– 16x VCPU
– 24x 2TB SATA drives, RAID0
– 8x mongod microshards
• Modified Yahoo Cloud Serving Benchmark (YCSB)
– Long Integer IDs (>2B)
– Zipfian-distributed integer fields
– Aggregation queries
• Load direct to 216 shards, 10 days, $4K "objects" : 7,170,648,489, "avgObjSize" : 147,438.99952658816, "dataSize" : NumberLong("1,057,240,224,818,640") (commas added)
Petascale Database
![Page 31: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/31.jpg)
CGroup Memory Segregation
for DB in `seq 0 3`; do sudo cgcreate \ -a mongodb:mongodb \ -t mongodb:mongodb \ -g memory:mongodb$D sudo echo 48G > \ /sys/fs/cgroup/memory/mongodb$D/memory.limit_in_bytes cgexec \ -g memory:mongodb$DB \ numactl –interleave=all \ mongod –-config ~/mongod$DB.confdone
![Page 32: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/32.jpg)
37
• Ingest 250-byte stock quotes at 2M/s
• Concurrently run 5 QPS, subsecond/indexed response on timeStamp, accountId, instrumentId, systemKey
• 5x r3.4xlarge– 16x VCPU, 1x 320GB SSD, 122GB RAM, 16x mongod
– 2.1M insert/second direct to shards
• 16x c3.8xlarge– 32x VCPU, 2x 320GB SSD, 60GB RAM, 16x mongod, 4x mongos
– 2.1M insert/second via mongos
Megawrite Ingest
![Page 33: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/33.jpg)
38
• 2 threads on c3.8xl
• 264 bsonsize object, _id index only
• coll.insert() 15,600 ins / sec
• coll.insert(List<DBObject>)listsize = 64: 118,000 ins / sec
• Bulk ops APIsize = 64: 120,000 ins / sec
Java API comparison
![Page 34: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/34.jpg)
BulkWriteOperation bo = null; for(a = 0; a < this.items && stayAlive; a++) { if(bo == null) { bo = collection.initializeUnorderedBulkOperation(); } fillMap(this.m); BasicDBObject dbObject = new BasicDBObject(this.m); bo.insert(dbObject); if(0 == a % listsize) { BulkWriteResult rc = bo.execute(); bo = null; }}
7x Load with BulkOp
![Page 35: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/35.jpg)
How do I Pick A Shard Key?
![Page 36: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/36.jpg)
41
Shard Key characteristics
• A good shard key has:– sufficient cardinality
– distributed writes
– targeted reads ("query isolation")
• Shard key should be in every query if possible– scatter gather otherwise
• Choosing a good shard key is important!– affects performance and scalability
– changing it later is expensive
![Page 37: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/37.jpg)
42
Hashed shard key
• Pros:– Evenly distributed writes
• Cons:– Random data (and index) updates can be IO intensive
– Range-based queries turn into scatter gather
Shard 1
mongos
Shard 2 Shard 3 Shard N
![Page 38: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/38.jpg)
43
Low cardinality shard key
• Induces "jumbo chunks"
• Examples: boolean field
Shard 1
mongos
Shard 2 Shard 3 Shard N
[ a, b )
![Page 39: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/39.jpg)
44
Ascending shard key
• Monotonically increasing shard key values cause "hot spots" on inserts
• Examples: timestamps, _id
Shard 1
mongos
Shard 2 Shard 3 Shard N
[ ISODate(…), $maxKey )
![Page 40: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/40.jpg)
Ensuring Success with High Scalability
![Page 41: Agility and Scalability with MongoDB](https://reader034.vdocuments.us/reader034/viewer/2022051817/547e89a8b379596f2b8b5493/html5/thumbnails/41.jpg)
46
Success Factors
• Storage: random seeks (IOPS)
• RAM: working set based on query patterns
• Query: indexing
• Delete: most expensive operation
• Real-time vs. bulk operations
• Continuity: HA, DR, backup, restore
• Agile process: iterate by powers of 4
• Sharding: shard key and strategy
• Resources: don’t go it alone!