Transcript
Page 1: Advanced Replication

Solutions Architect, 10gen

Marc Schwering

#MongoDBDays - @m4rcsch

Advanced Replication

Page 2: Advanced Replication

Roles & Configuration

Page 3: Advanced Replication

Replica Set Roles

Page 4: Advanced Replication

> conf = {

_id : "mySet",

members : [

{_id : 0, host : "A"},

{_id : 1, host : "B"},

{_id : 2, host : "C", "arbiter" : true}

]

}

> rs.initiate(conf)

Configuration Options

Page 5: Advanced Replication

Simple Setup Demo

Page 6: Advanced Replication

Behind the Curtain

Page 7: Advanced Replication

Implementation details

• Heartbeat every 2 seconds– Times out in 10 seconds

• Local DB (not replicated)– system.replset– oplog.rs• Capped collection• Idempotent version of operation stored

Page 8: Advanced Replication

Op(erations) Log

Page 9: Advanced Replication

> db.replsettest.insert({_id:1,value:1})

{ "ts" : Timestamp(1350539727000, 1), "h" : NumberLong("6375186941486301201"), "op" : "i", "ns" : "test.replsettest", "o" : { "_id" : 1, "value" : 1 } }

> db.replsettest.update({_id:1},{$inc:{value:10}})

{ "ts" : Timestamp(1350539786000, 1), "h" : NumberLong("5484673652472424968"), "op" : "u", "ns" : "test.replsettest", "o2" : { "_id" : 1 }, "o" : { "$set" : { "value" : 11 } } }

Op(erations) Log is idempotent

Page 10: Advanced Replication

oplog and multi-updates

Page 11: Advanced Replication

> db.replsettest.update({},{$set:{name : ”foo”}, false, true})

{ "ts" : Timestamp(1350540395000, 1), "h" : NumberLong("-4727576249368135876"), "op" : "u", "ns" : "test.replsettest", "o2" : { "_id" : 2 }, "o" : { "$set" : { "name" : "foo" } } }

{ "ts" : Timestamp(1350540395000, 2), "h" : NumberLong("-7292949613259260138"), "op" : "u", "ns" : "test.replsettest", "o2" : { "_id" : 3 }, "o" : { "$set" : { "name" : "foo" } } }

{ "ts" : Timestamp(1350540395000, 3), "h" : NumberLong("-1888768148831990635"), "op" : "u", "ns" : "test.replsettest", "o2" : { "_id" : 1 }, "o" : { "$set" : { "name" : "foo" } } }

Single operation can have many entries

Page 12: Advanced Replication

Operations

Page 13: Advanced Replication

Maintenance and Upgrade

• No downtime

• Rolling upgrade/maintenance– Start with Secondary– Primary last

– Commands: • rs.stepDown(<secs>)• db.version()• db.serverBuildInfo()

Page 14: Advanced Replication

Upgrade Demo

Page 15: Advanced Replication

Replica Set – 1 Data Center

• Single datacenter

• Single switch & power

• Points of failure:– Power– Network– Data center– Two node failure

• Automatic recovery of single node crash

Page 16: Advanced Replication

Replica Set – 2 Data Centers

• Multi data center

• DR node for safety

• Can’t do multi data center durable write safely since only 1 node in distant DC

Page 17: Advanced Replication

Replica Set – 2 Data Centers

• Analytics

• Disaster Recovery

• Batch Jobs

• Options– low or zero priority– hidden– slaveDelay

Page 18: Advanced Replication

Replica Set – 3 Data Centers

• Three data centers

• Can survive full data center loss

• Can do w= { dc : 2 } to guarantee write in 2 data centers (with tags)

Page 19: Advanced Replication

Replica Set – 3+ Data Centers

delayed

Secondary

Secondary

Secondary Secondar

y

Secondary

Secondary

Primary

Page 20: Advanced Replication

Commands

• Managing– rs.conf()– rs.initiate(<conf>) & rs.reconfig(<conf>)– rs.add(host:<port>) & rs.addArb(host:<port>)– rs.status()– rs.stepDown(<secs>)

• Minority reconfig– rs.reconfig( cfg, { force : true} )

Page 21: Advanced Replication

Options

• Priorities

• Hidden

• Slave Delay

• Disable indexes (on secondaries)

• Default write concerns

Page 22: Advanced Replication

Developing with Replica Sets

Page 23: Advanced Replication

Strong Consistency

Page 24: Advanced Replication

Delayed Consistency

Page 25: Advanced Replication

Write Concern

• Network acknowledgement

• Wait for error

• Wait for journal sync

• Wait for replication– number– majority– Tags

Page 26: Advanced Replication

Write Concern Demo

Page 27: Advanced Replication

Datacenter awareness (Tagging)

• Control where data is written to, and read from

• Each member can have one or more tags– tags: {dc: "ny"}– tags: {dc: "ny", subnet: "192.168", rack:

"row3rk7"}

• Replica set defines rules for write concerns

• Rules can change without changing app code

Page 28: Advanced Replication

{

_id : "mySet",

members : [

{_id : 0, host : "A", tags : {"dc": "ny"}},

{_id : 1, host : "B", tags : {"dc": "ny"}},

{_id : 2, host : "C", tags : {"dc": "sf"}},

{_id : 3, host : "D", tags : {"dc": "sf"}},

{_id : 4, host : "E", tags : {"dc": "cloud"}}],

settings : {

getLastErrorModes : {

allDCs : {"dc" : 3},

someDCs : {"dc" : 2}} }

}

> db.blogs.insert({...})

> db.runCommand({getLastError : 1, w : "someDCs"})

> db.getLastErrorObj({"someDCs"})

Tagging Example

Page 29: Advanced Replication

Wait for Replication

Page 30: Advanced Replication

settings : {

getLastErrorModes : {

allDCs : {"dc" : 3},

someDCs : {"dc" : 2}} }

}

> db.getLastErrorObj({"allDCs"},100);

> db.getLastErrorObj({”someDCs"},500);

> db.getLastErrorObj(1,500);

Write Concern with timeout

Page 31: Advanced Replication

Read Preference Modes

• 5 modes (new in 2.2)– primary (only) - Default– primaryPreferred– secondary– secondaryPreferred– Nearest

When more than one node is possible, closest node is used for reads (all modes but primary)

Page 32: Advanced Replication

Tagged Read Preference

• Custom read preferences

• Control where you read from by (node) tags– E.g. { "disk": "ssd", "use": "reporting" }

• Use in conjunction with standard read preferences– Except primary

Page 33: Advanced Replication

{"dc.va": "rack1", disk:"ssd", ssd: "installed" }

{"dc.va": "rack2", disk:"raid"}

{"dc.gto": "rack1", disk:"ssd", ssd: "installed" }

{"dc.gto": "rack2", disk:"raid”}

> conf.settings = { getLastErrorModes: { MultipleDC :

{ "dc.va": 1, "dc.gto": 1}}

> conf.settings = {

"getLastErrorModes" : {

"ssd" : {

"ssd" : 1

},...

Tags

Page 34: Advanced Replication

{ disk: "ssd" }

JAVA:

ReadPreference tagged_pref =

ReadPreference.secondaryPreferred(

new BasicDBObject("disk", "ssd")

);

DBObject result =

coll.findOne(query, null, tagged_pref);

Tagged Read Preference

Page 35: Advanced Replication

Tagged Read Preference

• Grouping / Failover{dc : "LON", loc : "EU"}

{dc : "FRA", loc : "EU"}

{dc : "NY", loc : "US”}

DBObject t1 = new BasicDBObject("dc", "LON");

DBObject t2 = new BasicDBObject("loc", "EU");

ReadPreference pref =

ReadPreference.primaryPreferred(t1, t2);

Page 36: Advanced Replication

Conclusion

Page 37: Advanced Replication

Best practices and tips

• Odd number of set members

• Read from the primary except for– Geographically distribution– Analytics (separate workload)

• Use logical names not IP Addresses in configs

• Set WriteConcern appropriately for what you are doing

• Monitor secondaries for lag (Alerts in MMS)

Page 38: Advanced Replication

Solutions Architect, 10gen

Marc Schwering

#MongoDBDays - @m4rcsch

Thank You


Top Related