back tobasicswebinar part6-rev

Solution Architect, MongoDB

Sam Weaver

#MongoDBBasics

‘Build an Application’ Webinar Series

Deploying your application in production

Agenda

• Replica Sets Lifecycle

• Developing with Replica Sets

• Scaling your database

Q&A

• Virtual Genius Bar– Use chat to post

questions– EMEA Solution

Architecture / Support Team are on hand

– Make use of them during the sessions!!!

Recap

• Introduction to MongoDB

• Schema design

• Interacting with the database

• Indexing

• Analytics– Map Reduce– Aggregation Framework

Deployment Considerations

Working Set Exceeds Physical Memory

Why Replication?

• How many have faced node failures?

• How many have been woken up from sleep to do a fail-over(s)?

• How many have experienced issues due to network latency?

• Different uses for data– Normal processing– Simple analytics

Replica Set Lifestyle

Replica Set – Creation

Replica Set – Initialize

Replica Set – Failure

Replica Set – Failover

Replica Set – Recovery

Replica Set – Recovered

Developing with Replica Sets

Strong Consistency

Delayed Consistency

Write Concern

• Network acknowledgement

• Wait for error

• Wait for journal sync

• Wait for replication

Unacknowledged

MongoDB Acknowledged (wait for error)

Wait for Journal Sync

Wait for Replication

Tagging

• Control where data is written to, and read from

• Each member can have one or more tags– tags: {dc: "ny"}– tags: {dc: "ny", subnet: "192.168", rack:

"row3rk7"}

• Replica set defines rules for write concerns

• Rules can change without changing app code

{

_id : "mySet",

members : [

{_id : 0, host : "A", tags : {"dc": "ny"}},

{_id : 1, host : "B", tags : {"dc": "ny"}},

{_id : 2, host : "C", tags : {"dc": "sf"}},

{_id : 3, host : "D", tags : {"dc": "sf"}},

{_id : 4, host : "E", tags : {"dc": "cloud"}}],

settings : {

getLastErrorModes : {

allDCs : {"dc" : 3},

someDCs : {"dc" : 2}} }

}

> db.blogs.insert({...})

> db.runCommand({getLastError : 1, w : "someDCs"})

Tagging Example

Wait for Replication (Tagging)

Read Preference Modes

• 5 modes– primary (only) - Default– primaryPreferred– secondary– secondaryPreferred– Nearest

When more than one node is possible, closest node is used for reads (all modes but primary)

Tagged Read Preference

• Custom read preferences

• Control where you read from by (node) tags– E.g. { "disk": "ssd", "use": "reporting" }

• Use in conjunction with standard read preferences– Except primary

• SAFE writes acceptable for our use case

• Potential to use secondary reads for comments, but probably not needed

• Use tagged reads for analytics

Our application

Scaling

Working Set Exceeds Physical Memory

• When a specific resource becomes a bottle neck on a machine or replica set• RAM• Disk IO• Storage• Concurrency

When to consider Sharding?

Vertical Scalability (Scale Up)

Horizontal Scalability (Scale Out)

Partitioning

• User defines shard key

• Shard key defines range of data

• Key space is like points on a line

• Range is a segment of that line

Initially 1 chunk

Default max chunk size: 64mb

MongoDB automatically splits & migrates chunks when max reached

Data Distribution

Architecture

What is a Shard?

• Shard is a node of the cluster

• Shard can be a single mongod or a replica set

Meta Data Storage

• Config Server– Stores cluster chunk ranges and locations– Can have only 1 or 3 (production must have

3)– Not a replica set

Routing and Managing Data

• Mongos– Acts as a router / balancer– No local data (persists to config database)– Can have 1 or many

Sharding infrastructure

Cluster Request Routing

• Targeted Queries

• Scatter Gather Queries

• Scatter Gather Queries with Sort

Cluster Request Routing: Targeted Query

Routable request received

Request routed to appropriate shard

Shard returns results

Mongos returns results to client

Cluster Request Routing: Non-Targeted Query

Non-Targeted Request Received

Request sent to all shards

Shards return results to mongos

Cluster Request Routing: Non-Targeted Query with Sort

Non-Targeted request with sort received

Request sent to all shards

Query and sort performed locally

Shards return results to mongos

Mongos merges sorted results

Shard Key

Shard Key

• Shard key is immutable

• Shard key values are immutable

• Shard key must be indexed

• Shard key limited to 512 bytes in size

• Shard key used to route queries– Choose a field commonly used in queries

• Only shard key can be unique across shards– `_id` field is only unique within individual shard

A suitable shard key for our app…

• Occurs in most queries

• Routes to each shard

• Is granular enough to not exceed 64MB chunks

• Any candidates?– Author?– Date?– _id?– Title?– Author & Date?

Summary

Things to remember

• Size appropriately for your working set

• Shard when you need to, not before

• Pick a shard key wisely

Next Session – 17th April

• Backup and Disaster Recovery• Backup and restore options

Thank you

back tobasicswebinar part6-rev

Documents

shard key shard key

cluster shard

working set shard

size shard key

suitable shard key

individual shard

appropriate shard

replica set recovery