mongodb and the mean stack

51
Ger Hartnett & Alan Spencer MongoDB Dublin

Upload: mongodb

Post on 18-Jun-2015

5.751 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: MongoDB and the MEAN Stack

Ger Hartnett & Alan Spencer MongoDB Dublin

Page 2: MongoDB and the MEAN Stack

2

• Fictional story of a startup using MongoDB & MEAN stack to build IoT application

• We’ll take a devops perspective - show you what to watch out for a framework like MEAN

• Tips you can use to help development team focus on the right things when close to production

• Questions • How many from operations? • How many from development?

Overview

Page 3: MongoDB and the MEAN Stack

3

Capacity planning/prototyping is a good idea but performance is sensitive to sample test data

The MEAN stack rocks - fast to get started - profiler can help you understand what’s under the hood

Realtime/incremental aggregation works well with IoT workloads - the “MMS approach”

With NodeJS/Express number of app servers becomes bottleneck before MongoDB

Performance tuning patterns apply - "bottleneck whack-a-mole" & “slam-dunk-optimization”

5 Things we Learned

Page 4: MongoDB and the MEAN Stack

Context: IoT & MEAN

Page 5: MongoDB and the MEAN Stack

Internet of Things

“The rise of device oriented development … new architectural and workflow challenges … distinctly different from … web and mobile development so far.” - Morten Bagai

Big Data => Humongous Data

Page 6: MongoDB and the MEAN Stack

6

Internet of Things

• Bosch: “IoT brings root and branch changes to the world of business”

• Richard Kreuter's Webinar May 2013

• Earlier bootcamp looked at sharding IoT

Photo by jurvetson - Creative Commons Attribution License - http://www.flickr.com/photos/jurvetson/916142

Page 7: MongoDB and the MEAN Stack

7

Express - web app framework/router

Angular - browser HTML/JS MVC

Node - javascript application server

MongoDB - the database

MEAN stack

Photo by benmizen - Creative Commons ShareAlike License - http://www.flickr.com/photos/benmizen/9456440635

Page 8: MongoDB and the MEAN Stack

8

Valeri Karpov - MongoDB Kernel Tools Team http://thecodebarbarian.wordpress.com/2013/07/22/introduction-to-the-mean-stack-part-one-setting-up-your-tools/ MEAN.io http://mean.io

Learn more about MEAN

Page 9: MongoDB and the MEAN Stack

9

We invest in technical new hires

Everyone does “bootcamp”

NYC for 2 weeks - product internals

Then work on a longer project 3-4 weeks

In our case: wanted to do a bit of everything, capacity planning, iterate user-stories, MongoDB a component

About MongoDB Bootcamp

Page 10: MongoDB and the MEAN Stack

The Application

Page 11: MongoDB and the MEAN Stack

11

!!!!!!!!

• IoT example 3 from Richard’s Webinar

Location based advertising - IoMT

Customer

Advertiser

AdvertiserAdvertiser

Page 12: MongoDB and the MEAN Stack

12

US1 - customer looks for advertisers near US2 - advertiser wants to see how many customers saw offer US3 - find hot spots where many customers but few advertisers

User Stories - for the application

Photo by consumerist - Creative Commons Attribution License - http://www.flickr.com/photos/consumerist/2158190589

Page 13: MongoDB and the MEAN Stack

exports.all = function(req, res) {!! findQuery = { near: [ Number(req.query.lng), Number(req.query.lat) ],!! ! maxDistance: Number(req.query.dist) };!! Advertiser.geoSearch({kind:"pub"}, findQuery, !! ! function (err, advertisers) {! // error handling!! !! res.jsonp(advertisers);!! ! });!}

13

Document / Model / Controller

Model (advertiser.js) Document{ name: ‘Long Hall’, pos: [-6.265535, 53.3418364], kind: “pub” }

AdvertiserSchema = new Schema({! name: { type: String,! default: ‘’},! pos: [Number],! kind: { type: String,! default: ‘place’},!});

Controller (advertisers.js)Haystack examples sent us in wrong direction initially

Page 14: MongoDB and the MEAN Stack

14

CRUD interface & Mongoose

CRUD interface !Raised & fixed bug in Mongoose, pull request merged

Page 15: MongoDB and the MEAN Stack

15

Capacity planning/prototyping is a good idea but performance is sensitive to sample test data

The MEAN stack rocks - fast to get started - profiler can help you understand what’s under the hood

Realtime/incremental aggregation works well with IoT workloads - the “MMS approach”

With NodeJS/Express number of app servers becomes bottleneck before MongoDB

Performance tuning patterns apply - "bottleneck whack-a-mole" & “slam-dunk-optimization”

5 Things we Learned

Page 16: MongoDB and the MEAN Stack

16

MongoDB shell scripts 9 advertisers, small area, distance 10km MongoDB has 5 kinds of geo query 3 kinds of geo index geoSearch (haystack) looked much better than others (our 1st mistake) TIP: performance is sensitive to test data & query

US1 Initial Measurements

Page 17: MongoDB and the MEAN Stack

17

Capacity planning/prototyping is a good idea but performance is sensitive to sample test data

The MEAN stack rocks - fast to get started - profiler can help you understand what’s under the hood

Realtime/incremental aggregation works well with IoT workloads - the “MMS approach”

With NodeJS/Express number of app servers becomes bottleneck before MongoDB

Performance tuning patterns apply - "bottleneck whack-a-mole" & “slam-dunk-optimization”

5 Things we Learned

Page 18: MongoDB and the MEAN Stack

The good thing about frameworks is… !they do lot’s of things for developers !!!…and the bad thing about frameworks? !they do lot’s of things for developers

Page 19: MongoDB and the MEAN Stack

19

To find out what’s happening - debug

Console

Mongoose: clients.findOne({ _id: ObjectId(“…”) })!Mongoose: advertisers.geoHaystack({…[-6.267765, 53.34087]})!

We used Express passport-http to add Basic-Digest auth (client id lookup) It can be hard to figure out what a framework like express/mongoose really does Tip: mongoose.set('debug', true) - detailed logging

Page 20: MongoDB and the MEAN Stack

20

Find out what’s happening - profiler

db.system.profile.find{"op":"query", "ns":"tings.clients",...!{“op":"command", "command":{"geoSearch"...!

{"op" :"update","ns":"tings.sessions"...!

Tip: The MongoDB profiler shows operations really happening on DB, check with dev

exports.all = function(req, res) {!. . .!! ! ! req.session = null;!! !! res.jsonp(advertisers);!}

10% performance improvement

Where did that come from?

Fixing it is not obvious

Page 21: MongoDB and the MEAN Stack

Back to the application

Page 22: MongoDB and the MEAN Stack

22

US1 - customer looks for advertisers near • Need to store

customer location US2 - advertiser wants to see how many customers near

US2 means we built on US1

Photo by consumerist - Creative Commons Attribution License - http://www.flickr.com/photos/consumerist/2158190589

Being a startup we decided to take a naive pragmatic approach: • Store all samples • US2 aggregates on-demand

Page 23: MongoDB and the MEAN Stack

23

Capacity planning/prototyping is a good idea but performance is sensitive to sample test data

The MEAN stack rocks - fast to get started - profiler can help you understand what’s under the hood

Realtime/incremental aggregation works well with IoT workloads - the “MMS approach”

With NodeJS/Express number of app servers becomes bottleneck before MongoDB

Performance tuning patterns apply - "bottleneck whack-a-mole" & “slam-dunk-optimization”

5 Things we Learned

Page 24: MongoDB and the MEAN Stack

1 hour of raw samples @ 2k RPS = 7.2M documents !Aggregation on 7.2M raw samples took 1 second on our instances Significant impact • Run every 2 seconds

RPS dropped by factor of 4! (single instance)

24

US2 - Aggregation of Raw Samples

Query Aggregate

Raw Insert

Samples

Aggregate

Page 25: MongoDB and the MEAN Stack

25

US2 - Pre aggregation

Query Aggregate

Raw Insert

Samples

Query Aggregate

Pre Aggregate

!Update

Samples

Aggregate Aggregate

An MMS type approach Document for advertiser-customer-month !Using update multi-true (more on this later) !Query now only needs to aggregate unique customers

Page 26: MongoDB and the MEAN Stack

26

MongoDB shell scripts More realistic data - old measurements repeated locations 110k advertisers with clusters in DUB and NYC Performance best for near and nearSphere (2x better than Haystack)

US1 measurements revisited

Page 27: MongoDB and the MEAN Stack

27

• Express/Mongoose/Node • Customer Lookup • Find ($near) • Save Sample DB • Save Sample File • Preagg=multiple docs (6) • Preagg=multi-update 1 doc

Where does the time go?

Page 28: MongoDB and the MEAN Stack

28

Capacity planning/prototyping is a good idea but performance is sensitive to sample test data

The MEAN stack rocks - fast to get started - profiler can help you understand what’s under the hood

Realtime/incremental aggregation works well with IoT workloads - the “MMS approach”

With NodeJS/Express number of app servers becomes bottleneck before MongoDB

Performance tuning patterns apply - "bottleneck whack-a-mole" & “slam-dunk-optimization”

5 Things we Learned

Page 29: MongoDB and the MEAN Stack

MongoD

29

Deployment

Chrome:PostmanNodeJS

HAproxy

NodeLoad

NodeJS

NodeJS

NodeJS MongoD

Page 30: MongoDB and the MEAN Stack

30

Scaling

Page 31: MongoDB and the MEAN Stack

31

Capacity planning/prototyping is a good idea but performance is sensitive to sample test data

The MEAN stack rocks - fast to get started - profiler can help you understand what’s under the hood

Realtime/incremental aggregation works well with IoT workloads - the “MMS approach”

With NodeJS/Express number of app servers becomes bottleneck before MongoDB

Performance tuning patterns apply - "bottleneck whack-a-mole" & “slam-dunk-optimization”

5 Things we Learned

Page 32: MongoDB and the MEAN Stack

2 - HAproxy

1 - number of Node.JS

3 - load gen threads/BW

Page 33: MongoDB and the MEAN Stack

MongoD

33

Pattern: “slam dunk optimization"

Chrome:PostmanNodeJS

HAproxy

NodeLoad

NodeJS

NodeJS

NodeJS MongoD*

3

2

1

Page 34: MongoDB and the MEAN Stack

34

1. Increase number of Node.JS 2. Increase perf of proxy/balancer instance

HAproxy more balanced than Amazon ELB 3. Tweak Nodeload (generates/measures REST)

Nodeload concurrency 3x Node servers Run Nodeload on same machine as HAproxy

Development recommendation: Postman chrome ext - generates REST / Basic Auth

Performance tips

Page 35: MongoDB and the MEAN Stack

Back to the application

Page 36: MongoDB and the MEAN Stack

36

US3 Overview

What are the top 10 hot sales areas? • What is an “area”…? Requirements • Little impact, easy to calculate • Approx. Regular size • Optimal approx. distance - “bounding areas” • Plays nice with sharding Internals of haystack, 2dsphere? Polygon? MGRS?

Page 37: MongoDB and the MEAN Stack

37

US3 - Hot box - Sales, go sell!

Page 38: MongoDB and the MEAN Stack

38

• 4QFJ123678 precision level 100m

MGRS - Military Grid Reference System

Image by Mikael Rittri - Creative Commons ShareAlike License http://en.wikipedia.org/wiki/File:MGRSgridHawaiiSchemeAARealigned.png

Page 39: MongoDB and the MEAN Stack

39

MGRS - But at the poles…

39 Image by Mikael Rittri - Creative Commons ShareAlike License http://en.wikipedia.org/wiki/File:MGRSgridNorthPole.png

Page 40: MongoDB and the MEAN Stack

Introducing the ‘box’

Page 41: MongoDB and the MEAN Stack

x

41

• Reinvented the sphere • Long/lat -> box number • Tailored to specific distance • Boxes are at least 1km • Search in current and 8

neighbouring boxes !

• Filter outside circle in JS • Performed relatively well • Can be used to shard

The “box” - the poor-man’s MGRS

Page 42: MongoDB and the MEAN Stack

42

Replication

Page 43: MongoDB and the MEAN Stack

43

Impact of Replication

Secondary reads !Worked for this app !Beware - don’t try this at home!

Page 44: MongoDB and the MEAN Stack

44

Apply the production notes

Change from default readahead Disable NUMA & THP ext4 or XFS noatime Load test workload on different configurations Instance Store / EBS (PIOPs) SSDs / spinning rust AWS instance types

Page 45: MongoDB and the MEAN Stack

Recap

Page 46: MongoDB and the MEAN Stack

46

Capacity planning/prototyping is a good idea but performance is sensitive to sample test data

The MEAN stack rocks - fast to get started but profiler can help you understand what’s under the hood

Realtime/incremental aggregation works well with IoT workloads - the “MMS approach”

Performance tuning patterns apply - "bottleneck whack-a-mole" & “slam-dunk-optimization”

With NodeJS/Express number of app servers becomes bottleneck before MongoDB

5 Things we Learned

Page 47: MongoDB and the MEAN Stack

Next Steps

Page 48: MongoDB and the MEAN Stack

48

Plan to publish as blog post series and github project !Check blog.mongodb.org !Continue to explore…

Next Steps

Page 49: MongoDB and the MEAN Stack

49

Hadoop/YARN for aggregations Use “box” to geo-shard Try 2.6 bulk updates Dynamic angular-google-maps with socket-io Implement in another framework (Go/Clojure) to load MongoDB with less hardware Find balance between batch and pre-aggregation (see next slide)

Next Steps - continuation

Page 50: MongoDB and the MEAN Stack

50

Introduction to MEAN - Valeri Karpov http://thecodebarbarian.wordpress.com/2013/07/22/introduction-to-the-mean-stack-part-one-setting-up-your-tools/

MEAN.io http://mean.io

Richard Kreuter's webinar - M2M http://www.mongodb.com/presentations/webinar-realizing-promise-machine-machine-m2m-mongodb

Building MongoDB Into Your Internet of Things http://blog.mongohq.com/building-mongodb-into-your-internet-of-things-a-tutorial/

Schema design for time series data (MMS) http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb

Learn More & Thank You

Page 51: MongoDB and the MEAN Stack