mongodb monitoring become a mongodb dba - monitoring ... live... · copyright 2017 severalnines ab...
TRANSCRIPT
Copyright 2017 Severalnines AB
MongoDB Monitoring
Art van ScheppingenSenior Support Engineer, Severalnines
Become a MongoDB DBA - Monitoring Essentials
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Monitoring and trending● Why do we collect data?● What metrics to collect from MongoDB?● Key MongoDB metrics in depth● Available MongoDB monitoring tools● How to monitor MongoDB using ClusterControl
Agenda
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
Monitoring and trending
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
Do you need monitoring and trending?
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
There is only one person who can land a plane without instruments
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Monitoring system (i.e. Nagios)○ Checks if services are healthy ○ Sends pages
● Trending system (i.e. Cacti, Graphite, Prometheus)○ Collects metrics○ Generate graphs
Monitoring vs Trending
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Do more than just opening a connection○ Measure true status of nodes and cluster○ Test read/write○ Open essential databases and collections○ Keep an eye on the replication lag
■ Increase oplog size?○ Check the full topology
Monitoring: Availability
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Trending○ Plot trends of key (performance) metrics○ Create timelines of metrics○ Correlate various metrics○ Find problems before they arise○ Pre-emptive problem management
● Trending tools○ Granularity of sampling○ More datapoints = better
Trending: why do we need trends?
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
Why do we collect data?
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Periodical (daily/weekly) healthchecks● Insight into all aspects of the database operations● Post mortem and proactive monitoring● Capacity planning
Why do we collect data?
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Healthchecks are a pain● You want to see aggregated
data● You want to be able to drill
down to a particular host● You want to see the most
important data first and dig in later on
Healthchecks
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Ability to dig into past data● Even less than 5s of data
granularity (hardware-dependent)
● Low granularity allows you to catch the issue as it evolves - no need to wait 5 minutes for a graph to refresh
Post mortem and proactive monitoring
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Graphs based on MongoDB status metrics
● Overall status and per-node graphs
● Ability to get a timeshifted graphs - useful for comparing workload changes across the time
Insight into internals, capacity planning
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
What metrics to collect from MongoDB?
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Quite similar to other database systems○ Host metrics○ Operational metrics○ Storage engine metrics○ Replication metrics○ Shard metrics
Type of metrics to collect
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Similar to most other databases● Understand the utilization of the machine● Capacity planning● Determine the type of an issue
○ I/O related?○ CPU related?○ Network related?
Host metrics: what for?
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● CPU utilization (should I add more nodes to the cluster?)● Network utilization (am I running out of bandwidth?)● Ping (how badly latency affects my MongoDB cluster?)● Disk throughput and IOPS (am I within my hardware limits?)● Disk space (do I have to plan for larger disks?)● Memory utilization (do I suffer from a memory leak?)
Host metrics: what to look for?
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Similar to most other databases● Throughput of the cluster● Relate throughput to cluster performance● Determine the type of an issue
○ Request spikes?○ Write amplification related?○ Queueing?
Operational metrics
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Storage engine specific○ MMAP○ Wired Tiger○ MongoRocks
● Insight in how the engine performs● Internal congestion
Storage engine metrics
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Throughput of the replication● Durability of the oplog● Replication lag● Cluster replication acknowledgement
○ Quorum based○ At least one secondary needs to acknowledge
Replication metrics
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
Eventual consistency
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Shard chunks and balancing○ Chunks per shard○ Disk usage
● Non-sharded collections○ Sharding has to be enabled on collection level○ Non-sharded collections get a primary shard assigned○ Once the primary shard is full, no writes can happen
● Connection pool (mongos)○ All queries will be sent to the primary in a shard○ Range queries will block connections of the connection pool
Sharding related metrics
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
Key MongoDB metrics to know about
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Oplog: a special collection containing all transactions○ Limited in size (configurable)○ Eviction of transactions (FIFO)○ Comparable to a ringbuffer
● Used for replication○ Secondaries copy transactions from the oplog on other nodes○ Full data sync necessary once the last executed transaction has been evicted
● Replication window○ Time between first and last transaction in the oplog○ Time that allows your secondary to be offline before performing a full sync
Oplog
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
From the MongoDB CLI
mongo_replica_0:PRIMARY> db.getReplicationInfo()
{
"logSizeMB" : 1895.7751951217651,
"usedMB" : 419.86,
"timeDiff" : 281419,
"timeDiffHours" : 78.17,
"tFirst" : "Fri Jul 08 2016 10:56:01 GMT+0000 (UTC)",
"tLast" : "Mon Jul 11 2016 17:06:20 GMT+0000 (UTC)",
"now" : "Mon Jul 11 2016 17:15:06 GMT+0000 (UTC)"
}
Oplog: replication window
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
From the ClusterControl advisor:
function getReplicationWindow(host) {
var replwindow = {};
replwindow['newset'] = false;
// Fetch the first and last record from the Oplog and take it's timestamp
var res = host.executeMongoQuery("local", '{find: "oplog.rs", sort: { $natural: 1}, limit: 1}');
replwindow['first'] = res["result"]["cursor"]["firstBatch"][0]["ts"]["$timestamp"]["t"];
if (res["result"]["cursor"]["firstBatch"][0]["o"]["msg"] == "initiating set") {
replwindow['newset'] = true;
}
res = host.executeMongoQuery("local", '{find: "oplog.rs", sort: { $natural: -1}, limit: 1}');
replwindow['last'] = res["result"]["cursor"]["firstBatch"][0]["ts"]["$timestamp"]["t"];
replwindow['replwindow'] = replwindow['last'] - replwindow['first'];
return replwindow;
}
Oplog: replication window
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● CPU, IO or lock related● Outcome:
○ Secondary not used by Mongo client drivers○ Puts larger strain on other secondaries○ Less likely to be elected during a failover
■ If it will be elected it could be disastrous○ Lagging behind too far could cause a full sync
Replication lag
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
my_mongodb_0:PRIMARY> db.runCommand( { replSetGetStatus: 1 } ) {
…
"members" : [
{
"_id" : 0,
"name" : "10.10.32.11:27017",
"stateStr" : "PRIMARY",
"optime" : {
"ts" : Timestamp(1466247801, 5),
"t" : NumberLong(1)
},
},
{
"_id" : 1,
"name" : "10.10.32.12:27017",
"stateStr" : "SECONDARY",
"optime" : {
"ts" : Timestamp(1466247801, 5),
"t" : NumberLong(1)
},
},
…
],
"ok" : 1
}
Replication lag
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Like any other databases: availability● Client drivers may support connection pooling
○ Multiple non-blocking queries can use the same connection○ Spawns new connections when low on threshold
● Increase of connections○ Locking issues○ Application request bursts
Connections
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
From the MongoDB CLI
mongo_replica_0:PRIMARY> db.serverStatus().connections
{ "current" : 25, "available" : 794, "totalCreated" : NumberLong(122418) }
From any mongo client
mongo_replica_0:PRIMARY> db.runCommand( { serverStatus: 1 } ).connections
{ "current" : 25, "available" : 794, "totalCreated" : NumberLong(122418) }
Connections
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Atomicity on document level○ Wiredtiger and MongoRocks
● No “real” transactions● Write data with the $isolated operator
○ Similar to READ UNCOMMITTED in MySQL (dirty reads in ANSI SQL)○ No rollback○ Does not work on shards
Transactions
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
Transactions
From the MongoDB CLImongo_replica_0:PRIMARY> db.serverStatus().opcounters
{
"insert" : 1355272,
"query" : 20712,
"update" : 8995,
"delete" : 0,
"getmore" : 400791,
"command" : 2405749
}
From any mongo clientmongo_replica_0:PRIMARY> db.runCommand({serverStatus: 1}).opcounters
{
"insert" : 1355272,
"query" : 20712,
"update" : 8995,
"delete" : 0,
"getmore" : 400791,
"command" : 2405749
}
Transactions
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Three levels of (generic) locking○ Global○ Database○ Collection
● Global lock hardly ever happens (full lock on MongoDB)● Database locks occur when dropping a collection● Collection locks occur mostly in MMAP
Locks
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
From the MongoDB CLImongo_replica_0:PRIMARY> db.serverStatus().locks
{
"Global" : {
"acquireCount" : {
"r" : NumberLong(6050583),
"w" : NumberLong(2416551),
"R" : NumberLong(1),
"W" : NumberLong(7)
},
"acquireWaitCount" : {
"r" : NumberLong(1),
"w" : NumberLong(1),
"W" : NumberLong(1)
},
…}
Locks (generic)
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Optimistic concurrency control○ If two write operations conflict, the transaction will be paused and retried
● Document level locking● Tickets (threads)
○ Read○ Write
Locks (WiredTiger)
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
From the MongoDB CLImongo_replica_0:PRIMARY> db.serverStatus().wiredTiger.concurrentTransactions
{
"write" : {
"out" : 0,
"available" : 128,
"totalTickets" : 128
},
"read" : {
"out" : 0,
"available" : 128,
"totalTickets" : 128
}
}
Locks (WiredTiger)
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● MongoDB uses three tiers of cache○ Filesystem○ Active memory○ Storage engine (WiredTiger / MongoRocks)
● Page faults○ Cache miss
● Evictions
Cache
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
From the MongoDB CLImongo_replica_0:PRIMARY> db.serverStatus().extra_info.page_faults
37912924
mongo_replica_0:PRIMARY> db.serverStatus().wiredTiger.cache
{
"bytes currently in the cache" : 887889617,
"modified pages evicted" : 561514,
"tracked dirty pages in the cache" : 626,
"unmodified pages evicted" : 15823118
}
Page faults and cache usage
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Shards make write scaling transparently● Sharding can be solved with two methods:
○ Hash key distribution (limited)○ Shard lookup table
● MongoDB uses a combination of hash key distribution and shard lookup table○ Hash key (or range key) distribution gets divided into chunks (ranges)○ The chunk metadata gets stored in the config server
● The config server is the most important data in a MongoDB sharded cluster!● The shard router is the the second most important component● Shards can get out of balance
○ Non-sharded collections○ Heavy / large writes on a single chunk○ Auto balancing by the primary of the Config server (3.4) or mongos (< 3.2)
Shard metrics
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
From the MongoDB CLI:mongos> sh.status()
--- Sharding Status ---
…databases:
{ "_id" : "shardtest", "primary" : "sh1", "partitioned" : true }
shardtest.collection
shard key: { "_id" : 1 }
unique: false
balancing: true
chunks:
sh1 1
sh2 2
sh3 1
From any mongo client:mongos> use config
switched to db config
mongos> db.config.runCommand({aggregate: "chunks", pipeline: [{$group: {"_id": {"ns": "$ns", "shard": "$shard"}, "total_chunks": {$sum: 1}}}]})
{ "_id" : { "ns" : "test.usertable", "shard" : "mongo_replica_1" }, "total_chunks" : 330 }
{ "_id" : { "ns" : "test.usertable", "shard" : "mongo_replica_0" }, "total_chunks" : 328 }
{ "_id" : { "ns" : "test.usertable", "shard" : "mongo_replica_2" }, "total_chunks" : 335 }
Shard chunks and balancing
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
From the ClusterControl non-sharded collection advisor:use config;
var shard_collections = db.collections.find();
var sharded_names = {};
while (shard_collections.hasNext()) {
shard = shard_collections.next();
sharded_names[shard._id] = 1;
}
var admin_db = db.getSiblingDB("admin");
dbs = admin_db.runCommand({ "listDatabases": 1 }).databases;
dbs.forEach(function(database) {
if (database.name != "config") {
db = db.getSiblingDB(database.name);
cols = db.getCollectionNames();
cols.forEach(function(col) {
if( col != "system.indexes" ) {
if( shard_names[database.name + "." + col] != 1) {
print (database.name + "." + col);
}
}
});
}
});
Non-sharded collections
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
From the MongoDB CLImongos> db.runCommand( { "connPoolStats" : 1 } )
{
"numClientConnections" : 10,
"numAScopedConnections" : 0,
"totalInUse" : 4,
"totalAvailable" : 8,
"totalCreated" : 23,
"hosts" : {
"10.10.34.11:27019" : {
"inUse" : 1,
"available" : 1,
"created" : 1
},
"10.10.34.12:27018" : {
"inUse" : 3,
"available" : 1,
"created" : 2
}
},
...
"ok" : 1
}
Connection pool
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
Available MongoDB monitoring tools
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Open Source○ Nagios○ Zabbix
● Subscription based○ MongoDB Cloud Manager○ VividCortex○ ClusterControl
Alerting solutions
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Nagios-MongoDB○ https://github.com/mzupan/nagios-plugin-mongodb/○ Performs some very important checks
■ Replication lag■ Lock time percentage■ Index miss ratio
Nagios
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● MongoDB Zabbix monitoring plugin○ https://github.com/nightw/mikoomi-zabbix-mongodb-monitoring○ All the necessary metrics and more
■ Entries in oplog○ Pre-canned triggers
Zabbix
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Trending tools○ Statsd/Grafana○ Cacti○ Zabbix
● Subscription based○ MongoDB Cloud Manager○ VividCortex○ ClusterControl
Trending solutions
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Percona MongoDB Monitoring Templates○ https://www.percona.com/doc/percona-monitoring-plugins/1.1/cacti/mongodb-templates.
html
Cacti
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● PMM○ https://www.percona.com/doc/percona-monitoring-and-management/○ Open Source Monitoring & Management framework○ Can deploy, manage and monitor MySQL & MongoDB○ Uses Prometheus and Grafana
Orchestration systems: Percona Monitoring & Management
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● PMM○ https://www.percona.com/doc/percona-monitoring-and-management/○ Open Source Monitoring & Management framework○ Can deploy, manage and monitor MySQL & MongoDB○ Uses Prometheus and Grafana
Percona Monitoring & Management sessions:● MySQL Monitoring with Percona Monitoring and Management, Tue 11:30 - 12:20 in Ballroom E● Hipster MySQL Monitoring: Serving a deconstructed PMM, Tue 11:30 - 12:20 in Ballroom H● Monitoring production environment with Percona Monitoring and Management (PMM), Thu 3:00 - 3:50 in room 209
Orchestration systems: Percona Monitoring & Management
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
How to monitor MongoDB using ClusterControl
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● ClusterControl○ http://www.severalnines.com○ Deploy Mongo shards & replicasets○ Monitor and trend○ Manage configuration and backups○ Scale○ Community edition
Orechestration systems: ClusterControl
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
Easily deploy and import MongoDB replicaSets and Shards
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
Monitor and trend
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
Cluster management
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
Scale replicaSets and Shards
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
Convert replicaSet into a Sharded cluster
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
Q & A
Copyright 2017 Severalnines ABCopyright 2017 Severalnines AB
● Blog series: Become a MongoDB DBA○ http://severalnines.com/blog-categories/mongodb
● Webinar series: Become a MongoDB DBA○ http://severalnines.com/upcoming-webinars
● Visit our website for more resources!○ http://www.severalnines.com
● Other sessions by Severalnines at Percona Live 2017MySQL Load Balancers - MaxScale, ProxySQL, HAProxy, MySQL Router & nginx - a close up look, Wed 11:10am - 12:00pm in Ballroom DMySQL (NDB) Cluster Best Practices (Die Hard VIII), Wed 3:30pm - 4:20pm in Room 210
Additional resources
Copyright 2017 Severalnines AB
Thank You!