open source backups for mongodb - percona · open source backups for mongodb david murphy mongodb...

Post on 25-Jun-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Open Source Backups For MongoDB

David Murphy MongoDB Practice Manager

2

About me

▪ Former NoSQL/MySQL Architect for Electronic Arts (and yes, I probably worked on that game you’re

thinking about!)

▪ Original and Lead DBA for ObjectRocket, the high-performance Mongo-as-a-Service offering

▪ Mongo Master Alumni, and one of the early Mongo Masters

▪ Practice Manager for MongoDB @ Percona

▪ 15+ years in MySQL and other RDBMs

3

•  Today’s typical backup types:

▪  Logical

▪  Snapshot (iSCSI/LVM)

▪  OpsManager Backups

•  Complications when it comes to sharding

•  How to get consistent sharded backups in v3.2+

•  New tool from Percona Labs

Agenda

Today’s Backup Types Looking at today's single node or replica set backups, and the good and the bad in each

5

Today’s Tools: Logical Backups

Almost always use mongodump, which has some particular considerations:

▪ You must determine which secondary you want to talk to - The “H” option points to single host or replica set ( using secondary reads)

- Does not protect against lagging secondaries

▪ Single node can not be consistently backed up! - Because MongoDB uses read-uncommitted without an oplog, backups are not safe.

▪ Restores take a huge amount of time but spaced used is tiny

6

Today’s Tools: Snapshot Backups(LVM)

Assuming you are using LVM, there are some considerations -- however backups will always be 100% of the data size and will restore quickly with no need to “re-hydrate”

▪ Snapshot can be made instantly -You must choose which node to take a backup on (usually people make a hidden node)

-Must use 100% of the normal space, compression slows restoring

-Needs to have spare space in the VG for a snapshot volume

-Snapshot COW table will grow until it runs the VG out of space, and then the snapshot will stop

-Serious performance issues will occur while snapshot is active

You will want to delete the snapshot ASAP, after RSYNC the contents somewhere else

▪ Restores are fast and consistent Will only take the time to copy the files back into place

7

Today’s Tools: Snapshot Backups(iSCSI/NFS)

Everything from LVM, on instant snapshots, and fast/consistent restores apply, however:

▪ COW table might not still be an expense depending on the NFS/SAN used

▪ Deduplication can be used to help save space

▪ Incremental hourly snapshots might be possible

▪ MongoDB performs poorly on iSCSI by default, and might need tuning

▪ Due to the nature of NFS, MongoDB (especially MMAP) should not be used in

production typically

8

Using the this tool, you are choosing NOT to be open source, and locked into a vendor! ▪ Initial Backups

-Sends docs to MOM server in 10MB chunks, then sends all oplog changes

-Builds Copy DB + Applies Oplogs

-Marks this as the 1st backup done

▪ Oplog streaming -Able to now just stream and apply any oplogs to a backup like replication does

▪ Snapshots -At regular input points the current version of the DB copy is cloned

-New oplogs are applies to only 1 side

-Gives you snapshots you can return to that are maybe daily or hourly

Today’s Tools: MongoDB Ops Manager(MOM)

Complications when it comes to sharding How do we backup when we are using shards? How do we time things well?

10

Different shards will finish different backups types at different times ▪ Logical Backups

- Each shard will be of a different size, backups will finish at different times

- MongoDB-based dumps will not use --oplog and therefore won’t be consistent at each shard

- As different dumps finish at different times, three questions come up:

- Is the Balancer off?

- Are there any migrations running?

- What about new DB’s and manual moves?

▪ Binary/Snapshot Backups - These worked great in a single replica, but how do I make them all run at the same time?

- How do I make sure the above questions are answered?

Sharding Complications: Consistency

11

▪ Logical Backups

- No Native support for consistent sharded backups

▪ Binary/Snapshot Backups - No Native support for consistent sharded backups

▪ MongoDB Ops Manager - Only snapshot support, no Point-In-Time Recovery (PITR) support

Sharding Complications: Tool’s Sharding Support

Consistent Sharded Backups and 3.2+ The new design of config servers being a replica has solved a very complicated backup issue: point in time recovery of a sharded cluster.

13

3.2 is a HUGE leap forward for operation groups backing up MongoDB. Having the config servers be a replica set allows all parts of the system to be handled as one:

▪  If someone was able to run a snapshot at the same time on all shards and a config server then this isn’t an issue. However micro time variations could result in missing a change and therefore failing recovery tests. ▪  There was no good way to understand how to update each shard to Backup + 1 hour, and then update all the config metadata. Now we can say restore everything to Backup + 1 hour and we know it’s safe and exactly what the system was at the time. ▪  Some more tooling is still need to constantly capture the oplog for that case, but it’s least possible to do now.

What does 3.2 help fix?

Percona-Lab’s new backup tool What if there was a tool that let you point to a replica set or cluster, and it would worry about the backing up of shards, aligning the recovery point, and compressing them into a central logical place? What if that was only the first step, with binary/snapshot backups, incremental backups, and more coming?

15

IT’S ALL TRUE: mongodb-consistent-backup

What is this new tool? •  Not officially support by Percona until it makes it into Percona Tools directly •  Python single binary file tool that only needs Python 2.7, and it will build on it’s virtual environment as

to not have complicated dependencies •  Intelligent enough to detect mongos, become self recursive and backup all your shards automatically,

while being flexible enough if you point it at just one replica-set to back that up. Single Mongod’s won’t work with this tool, because mongodump can’t consistently back them up!

•  Ensure all shard’s dump times are consistent with each other by opening up oplog tailers to all shards until the last dump finishes.

•  At this point is forks

▪  If 3.2+ - It has also been dumping/tailing the config servers so everything is consistent :)

▪  If 3.0 or before - Fsync locks a config server and dumps it at the last moment with the balancer off

for the whole backup.

16

The vision •  Maturity to move it eventually into a Percona supported tool like XtraBackup and Percona Toolkit

▪  Remain 100% open source and free to the community ▪  Community involvement in what features you need, contributing improvements, and reporting bugs

•  Have a daemon process constantly getting oplogs for each shard and storing them as one file per hour per shard. ▪  Allow incremental backups, granular to the second recovery, while letting you control the retention

based on your budget •  Uploading to S3, Google Cloud Storage, Azure ZRS, Rackspace Cloud Files and services •  Restore tools to make more automated backups •  Modular backup methods like: LVM, MongoDump, iSCSI backups, MongoDB Admin Commands and more •  Encryption support •  Ability to filter some collections/databases out of the backups and restores •  Offline backup querying

Where is it going?

17

https://github.com/Percona-Lab/mongodb_consistent_backup

•  GPL license

•  Encourage community participation

•  Very actively developed for use across all our services, but still not moved

into Percona tools (not yet officially supported)

•  All issues go to myself and the escalations team for MongoDB @ Percona

Where do I find it?

Questions? What more would you like to see? Other tools the community needs?

Twitter: @dmurphy_data @percona Github: dbmurphy mongodb_consistent_backup: http://bit.ly/28InDuI

DATABASE PERFORMANCE MATTERS

top related