petamongo: a petabyte database for as little as $200
DESCRIPTION
1,000,000,000,000,000 bytes. On demand. Online. Live. Big doesn't quite describe this data. Amazon Web Services makes it possible to construct highly elastic computing systems, and you can further increase cost efficiency by leveraging the Spot Pricing model for Amazon EC2. We showcase elasticity by demonstrating the creation and teardown of a petabyte-scale multiregion MongoDB NoSQL database cluster, using Amazon EC2 Spot Instances, for as little as $200 in total AWS costs. Oh and it offers up four million IOPS to storage via the power of PIOPS EBS. Christopher Biow, Principal Technologist at 10gen | MongoDB covers MongoDB best practices on AWS, so you can implement this NoSQL system (perhaps at a more pedestrian hundred-terabyte scale?) confidently in the cloud. You could build a massive enterprise warehouse, process a million human genomes, or collect a staggering number of cat GIFs. The possibilities are huMONGOus.TRANSCRIPT
![Page 1: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/1.jpg)
© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
PetaMongo: A Petabyte Database for as Little as $200
Chris Biow, MongoDB
Miles Ward, AWS
November 13, 2013
![Page 2: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/2.jpg)
Agenda
• MongoDB on AWS review– Guidance, Storage, Architecture
• MongoDB at PetaScale on AWS
![Page 3: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/3.jpg)
• Whitepaper• Marketplace• CloudFormation
Tools to simplify your design
http://media.amazonwebservices.com/AWS_NoSQL_MongoDB.pdf
![Page 4: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/4.jpg)
• Easy to start a single node
• Correctly configured PIOPS EBS Storage
• No extra cost
https://aws.amazon.com/marketplace/pp/B00COAAEH8/ref=srh_res_product_title?ie=UTF8&sr=0-6&qid=1383897659043
![Page 5: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/5.jpg)
mongodb.org/display/DOCS/Automating+Deployment+with+CloudFormation
• Nested Templates
• Nodes and Storage
• Configurable Scale
• CloudFormation: Your Infrastructure belongs in your source control
CloudFormation
![Page 6: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/6.jpg)
AWS Storage Options
• EBS – Provisioned IOPS volumes• Deliver predictable, high performance for I/O intensive workloads• Specify IOPS required upfront, and EBS provisions for lifetime of volume– 4000 IOPS per volume, can stripe to get thousands of IOPS to an EC2 instance
• High IO Instances – hi1.4xlarge• For some applications that require tens of thousands of IOPS• Eliminates network latency/bandwidth as a performance constraint to storage
EBSPIOPS
SSD
![Page 7: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/7.jpg)
AWS Storage OptionsTesting: random 4k reads
EBS
SSD
PIOPS+
One Volume: ~200 MongoOPS with some variability, <1mb/sLoaded instance: ~ 1000 MongoOPS with some variability <10mb/s
One Volume: 200 0 MongoOPS with <1% variability, 16mb/sLoaded Instance: 16,000 MongoOPS with <1% variability, 64mb/s
Loaded Cluster Instance: MongoOPS, 320mb/s
Hi1.4xlarge ephemeral: ~64,000 MongoOPS with low variability, ~245mb/s
4,000
80,000
![Page 8: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/8.jpg)
Testing: random 4k reads
EBS
SSD
PIOPS+
Sta
ble
![Page 9: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/9.jpg)
Stability Tips
• Ext4 or XFS, nodiratime, noatime
• Raise file descriptor limits
• Set disk read-ahead
• No large virtual memory pages
• SNAPSHOT SNAPSHOT SNAPSHOT
![Page 10: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/10.jpg)
• Retain a PIOPS EBS node for snapshot backups
• Snapshots allow cross-AZ and cross-region recovery
• SSD hosts as primary
• Shard for scale
![Page 11: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/11.jpg)
244gb cr1.8xlargeAnother option…
![Page 12: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/12.jpg)
So, about that Petabyte
v.cheap
• Spot Market• m1.small• 1024 shards• 1TB EBS from snapshot• PowerBench reader• Aggregation queries
v.fast
• AutoScaling On-Demand• cc2.8xlarge• 44 instances x 24 shards
each• 24TBx1K PIOPS indexed• YCSB loader• Aggregation queries
![Page 13: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/13.jpg)
The naming of parts
Amazon Terms
• Provisioned IOPS• Elastic Compute Cloud• EC2 Spot Instances• Auto Scaling groups
Nicks
• PIOPS• EC2• Here, Spot!• ASG
![Page 14: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/14.jpg)
Players
![Page 15: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/15.jpg)
MongoDB• Document-model,
NoSQL database
• Dev adoption is STRONG
• MongoDB Inc. trending toward zero h/w
• Scale-up with commodity h/w• Scale-out with sharding• Scale-around with replication
![Page 16: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/16.jpg)
Dev Activity: stackoverflow.com
![Page 17: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/17.jpg)
AWS
• PIOPS for an IO-hungry client• 40% of MongoDB customer usage• 90% of MongoDB internal usage• More ports :2701[79] than :[15]521
![Page 18: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/18.jpg)
PB & ChocolateDifferentiators for mutual customers
• Fast time-to-solution• Easy global distribution• Document model• Secondary indexes• Geo, text, security• Fast analytic aggregation
![Page 19: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/19.jpg)
Challenge
![Page 20: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/20.jpg)
Motivation: IWBCI…
• Test scale-out of MongoDB beyond typical• Learn massive scale-out on AWS • Do it as cheaply as possible• Apply customer data• Break the petabarrier
![Page 21: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/21.jpg)
m1.small us-east1 Spot Market
![Page 22: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/22.jpg)
m1.small us-east1d Spot Market
![Page 23: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/23.jpg)
ProposalItem Units Time Unit Cost Net Cost
m1.small Spot 1050 3hr $0.007/hr $22.05
m1.large 3 48hrs $0.056/hr $8.07
S3 1TB 1wk $95/TB/mo 23.75
EBS 1024 x 1TB 1hr $100/TB/mo 142.22
S3 EBS 1PB lazy $0/TB 0.00
Total $196.09
http://calculator.s3.amazonaws.com/G77798SS77SH72
![Page 24: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/24.jpg)
Initial Directions
• Spot instance requests– m1.small market, mostly us-east-1 (my zone “d”)– Net: $0.007 / hour = $7 / hr / K-shard
• Perl– use Net::Amazon::EC2;– gaps: parse EC2 command-line API
• Defer Chef, Puppet, CloudFormation• YCSB• userdata.sh• t1.micro / m1.small / cr1.8xlarge
![Page 25: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/25.jpg)
MongoDB Architecture
• 3x Config Servers– mongod --configsvr
• Routing– mongos --configdb a,b,c
• Replica sets (not used)• Shards
– mongod
• Client load – java -cp [] com.yahoo.ycsb.Client
![Page 26: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/26.jpg)
![Page 27: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/27.jpg)
Range-based sharding
![Page 28: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/28.jpg)
Hash-based sharding
![Page 29: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/29.jpg)
Process Flow
Spot Instance Request (sir-)
• Rejected• Awaiting evaluation• Awaiting fulfillment
– Partial– Launch intervals
• Fulfilled
Instances (i-)
• Requested• Initializing (i)• Config running (C)• MongoS starting (s)• MongoS running (S)• MongoD starting (D)• Failed/slow response (X)
![Page 30: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/30.jpg)
Spot Instance Lifecycle
sir-
Config
Sharded
MongoD
Shard
MongoS
![Page 31: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/31.jpg)
Progress
![Page 32: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/32.jpg)
Scale Out Experience
• Sharding by magnitude: 4, 16, 64, 256, 1024• 4: functional validation• 16: startup variation, process flow• 64: full speed ahead!• 256: chunk distribution time, single Config• 1024: market dependence, client wire saturation
![Page 33: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/33.jpg)
Lessons Learned
• Code defensively• Monitor: MongoDB Mgt Svc, top, iftop, iostat,
mongostat• Avoid sentimental attachment (i-8bad8bee)• Prototype / refactor• Make the instances do the work• Mitigate chunk migration
![Page 34: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/34.jpg)
Refactor
• BenchPress YCSB• Auto Scaling Groups request-spot-instances • use VM::EC2; Net::Amazon::EC2 • gsh monolithic Perl• serf polling
![Page 35: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/35.jpg)
Secure Cloud Networking
Enable customers to easily connect, manage and secure applications across VPCs, regions, and hybrid infrastructures.
Cloud-scale your VPC connectivity!
VPC 1 VPC 2
ApplicationServiceMesh
After the Session: Survey - $500 Gift CardOr schedule a [email protected]
![Page 36: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/36.jpg)
5:16:48 5:45:36 6:14:24 6:43:12 7:12:00 7:40:480
200,000,000
400,000,000
600,000,000
800,000,000
1,000,000,000
1,200,000,000
1,400,000,000
1,600,000,000
1,800,000,000
1KB Docs Loaded, 512 shards
^ 1X RAM
![Page 37: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/37.jpg)
4:19:12 5:31:12 6:43:12 7:55:12 9:07:12 10:19:12 11:31:12 12:43:12 13:55:120
500,000,000
1,000,000,000
1,500,000,000
2,000,000,000
2,500,000,000
1KB Docs Loaded, 1035 shards, 2 jobs conflicting
^ 1X RAM
![Page 38: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/38.jpg)
Dee-Luxe
![Page 39: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/39.jpg)
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000
cc2.8xlarge, 24 x 1TB-4K PIOPS EBS, bulk-load 64KB docs
64KB-docsus-latency
100% RAM
![Page 40: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/40.jpg)
12:00:00 AM 02:24:00 AM 04:48:00 AM 07:12:00 AM 09:36:00 AM 12:00:00 PM 02:24:00 PM 04:48:00 PM 07:12:00 PM0
20,000,000
40,000,000
60,000,000
80,000,000
100,000,000
120,000,000
140,000,000
cc2.8xlarge, 24 x 1TB-4K PIOPS EBS, bulk-load 64KB docs
64KB-docsus-latency
![Page 41: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/41.jpg)
![Page 42: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/42.jpg)
Further Work
• Completion• Replication• Self-healing• MongoDB-appropriate benchmarks• Customer data• Self-hosting cluster
![Page 43: PetaMongo: A Petabyte Database for as Little as $200](https://reader037.vdocuments.us/reader037/viewer/2022110303/54b6cc0c4a79593a378b4583/html5/thumbnails/43.jpg)
Please give us your feedback on this presentation
As a thank you, we will select prize winners daily for completed surveys!
BDT307