mysql replication, the community sceptic roundup‣ mysql 5.6 and 5.7 ‣ not active by default ‣...
TRANSCRIPT
MySQL Replication, the Community Sceptic RoundupGiuseppe Maxia
Quality Assurance Architect
at VMware
@datacharmer1
Who’s this guy?About me
‣ Giuseppe Maxia, a.k.a. "The Data Charmer" • QA Architect at VMware
• 25+ years development and DB experience
• Long timer MySQL community member.
• Oracle ACE Director
• Blog: http://datacharmer.blogspot.com
• Twitter: @datacharmer
2
A
SKEPTIC?
3
SKEPTIC?
Features are announced. But not always they are usable. We verify every claim.
4
What will we see in this sessionSummary
‣ Global Transaction Identifiers
‣ Multi source replication
‣ Parallel replication
‣ Group replication
5
We will see practical examples with the following systemsActors
‣ MySQL 5.6.29+
‣ MySQL 5.7.12+
‣ MySQL 8.0.1
‣ MariaDB 10.0.20
‣ MariaDB 10.1.13
6
The most important reason:Focus on monitoring
‣ Replication will fail, sooner or later.
‣ Good monitoring metadata is what can tell you what the problem is (before it happens)
7
Global Transaction Identifiers
8
You think you know where your transactions are … until something unexpected happens
Transactions blues
‣ Problem: • MySQL replication identifies transactions with a
combination of binary log file name and offset position;
• When using many possible masters, file names and positions may differ.
• Practical cases: failover, circular replication, hierarchical replication
‣ Solution: use a global ID, not related to the file name and position
9
Transaction problem in a nutshell (1)
10
host1
host2 host3
master
slave
slave
slave
host4 host5
slave
binlog 120pos 5600
binlog 87pos 15
host6
binlog 120pos 5570
binlog 120pos 3400
binlog 189pos 932
slave
Transaction problem with GTID (1)
11
host1
host2 host3
master
slave
slave
slave
host4 host5
slave
GTID 786
GTID 785
host6
GTID 785 GTID 781
GTID 781
slave
A half baked feature, which kind of worksImplementation: (1) MySQL 5.6 & 5.7
‣ Made of server UUID + transaction ID • (e.g.: “e8679838-b832-11e3-b3fc-017f7cee3849:1”)
‣ Only transactional engines
‣ No “create table … select …” supported
‣ No temporary tables within transactions
‣ Requires log-slave-updates in all nodes (removed in 5.7)
12
A half baked feature, which kind of worksImplementation: (1) MySQL 5.6 & 5.7
‣ The good • GTID are easily parseable by scripts in the binlog
• Failover and transaction tracking are easier
‣ The bad • Not enabled by default
• Hard to read for humans!
• Little integration between GTID and existing software (ignored in crash-safe tables, parallel replication)
• makes log-slave updates mandatory (only in 5.6)13
Something was changed ...GTID in MySQL 5.7.6+
‣ GTID can now be enabled dynamically.
‣ However, it requires a 9 (NINE!) steps procedure.
‣ http://mysqlhighavailability.com/enabling-gtids-without-downtime-in-mysql-5-7-6/
14
MySQL 5.7: What you see in the master
show master status\G File: mysql-bin.000001 Position: 1033 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: d9f8aeb1-ff3a-11e5-a3d1-0242ac110002:1-4
show global variables like 'gtid_executed'\G Variable_name: gtid_executed Value: d9f8aeb1-ff3a-11e5-a3d1-0242ac110002:1-4 1 row in set (0.00 sec)
15
Excerpt from SHOW SLAVE STATUSMySQL 5.7: What you see in the slave
[...] Master_Server_Id: 100 Master_UUID: d9f8aeb1-ff3a-11e5-a3d1-0242ac110002 Master_Info_File: mysql.slave_master_info [ ... ] Retrieved_Gtid_Set: d9f8aeb1-ff3a-11e5-a3d1-0242ac110002:1-4 Executed_Gtid_Set: d9f8aeb1-ff3a-11e5-a3d1-0242ac110002:1-4
16
Note: we have two pieces of information: * retrieved * executed
No GTID info in mysql.slave_relay_log_infoMySQL 5.7: What you see in the slave
select * from slave_relay_log_info\G *************************** 1. row *************************** Number_of_lines: 7 Relay_log_name: ./mysql-relay.000002 Relay_log_pos: 1246 Master_log_name: mysql-bin.000001 Master_log_pos: 1033 Sql_delay: 0 Number_of_workers: 0 Id: 1 Channel_name: 1 row in set (0.00 sec)
17
More on this topic when we discuss monitoring
A well thought feature, with some questionable choices Implementation (2) MariaDB 10
‣ Made of domain ID+server ID + number • e.g. (0-101-10)
‣ Enabled by default
‣ Uses a crash-safe table
‣ No limitations
‣ Lack of integration with old replication coordinates.
18
MariaDB 10.0: What you see in the master
show master status\G File: mysql-bin.000001 Position: 3139 Binlog_Do_DB: Binlog_Ignore_DB:
show variables like '%gtid%pos'; +------------------+--------+ | Variable_name | Value | +------------------+--------+ | gtid_binlog_pos | 0-1-14 | | gtid_current_pos | 0-1-14 | | gtid_slave_pos | | +------------------+--------+
19
MariaDB 10.0: What you see in the slave
[ ... ] Using_Gtid: Current_Pos Gtid_IO_Pos: 0-1-14 Replicate_Do_Domain_Ids: Replicate_Ignore_Domain_Ids: [ ... ]
20
Excerpt from SHOW SLAVE STATUS
Note: we have only one piece of information: * IO_Pos ( = retrieved)
MariaDB 10.0: What you see in the slave
select * from mysql.gtid_slave_pos; +-----------+--------+-----------+--------+ | domain_id | sub_id | server_id | seq_no | +-----------+--------+-----------+--------+ | 0 | 13 | 1 | 13 | | 0 | 14 | 1 | 14 | +-----------+--------+-----------+--------+
21
Table in mysql schema
Note: we have only one piece of information related to the execution of the transaction identified by the GTID
Claim: global transaction identifiers
‣ Claimed by ‣ MySQL 5.6 and 5.7 ‣ MariaDB 10.0 and 10.1
22
Sceptic assessment: global transaction identifiers
‣ MySQL 5.6 and 5.7 ‣ Not active by default ‣ Unfriendly for humans ‣ Lack of integration with other features
‣ MariaDB 10.0 and 10.1 ‣ Friendlier then MySQL 5.6/5.7 ‣ Insufficient info for monitoring
23
CAN DO MUCH BETTER!
Monitoring (MySQL 5.6+ - MariaDB 10)
24
All replication data should be now in tablesThe new trend : using tables to monitor
‣ Both MySQL and MariaDB 10 can monitor replication using tables.
‣ But not all data is available
25
There are tables that can replace files, and SHOW statements ... up to a point
MySQL 5.6 crash-safe tables
‣ up to 5.5: • SQL in the slave
- show slave status
• SQL in the master
- show master status
26
‣ 5.6 & 5.7: ‣ Tables in the slave
‣ slave_master_info
‣ slave_relay_log_info
‣ slave_worker_info
‣ performance_schema (5.7)
‣ SQL in the master
‣ show master status
‣ select @@global.gtid_executed
Very detailed, but designed in different stagesMySQL tables
‣ One table replaces the file master.info
‣ Another replaces relay-log.info
‣ They were designed before introducing GTID
‣ There is NO GTID in these tables
‣ They are NOT updated continuously
27
Performance Schema helps with monitoringMySQL 5.7 additional tables in the slave
‣ replication_applier_configuration
‣ replication_applier_status
‣ replication_applier_status_by_coordinator
‣ replication_applier_status_by_worker
‣ replication_connection_configuration
‣ replication_connection_status
28
Despite all these tables, not all info from SHOW SLAVE STATUS is available
Some good newsMySQL 8.0.1 addition
‣ More info on • replication_connection_status (IO_thread)
• replication_applier_status_by_worker (SQL_thread)
29
A complete redesign of the monitoring system, integrated with GTID
MariaDB 10 crash-safe tables
‣ up to 5.5: • SQL in the slave
- show slave status
• SQL in the master
- show master status
30
‣ 10.0 • Table in the slave
- gtid_slave_pos
• SQL in the master
- show master status
- select @@gtid_current_pos
in the mysql databaseMySQL 5.7: tables in the slave
select * from slave_relay_log_info\G *************************** 1. row******* Number_of_lines: 7 Relay_log_name: ./mysql-relay.000002 Relay_log_pos: 1246 Master_log_name: mysql-bin.000001 Master_log_pos: 1033 Sql_delay: 0 Number_of_workers: 0 Id: 1 Channel_name: 1 row in set (0.00 sec)
31
in the mysql databaseMySQL 5.7: tables in the slave
select * from mysql.slave_master_info\G *************************** 1. row ******************* Number_of_lines: 25 Master_log_name: mysql-bin.000001 Master_log_pos: 154 Host: 172.17.0.2 User_name: rdocker User_password: rdocker Port: 3306 Connect_retry: 60 Enabled_ssl: 0 [...] Heartbeat: 30 Ignored_server_ids: 0 Uuid: f4c64510-ff4c-11e5-80f9-0242ac110002 Retry_count: 86400
32
in the performance_schema databaseMySQL 5.7: tables in the slave
select * from replication_applier_configuration\G *************************** 1. row ***************** CHANNEL_NAME: DESIRED_DELAY: 0 1 row in set (0.00 sec)
select * from replication_applier_status\G *************************** 1. row ***************** CHANNEL_NAME: SERVICE_STATE: ON REMAINING_DELAY: NULL COUNT_TRANSACTIONS_RETRIES: 0
33
in the performance_schema databaseMySQL 5.7: tables in the slave
select * from replication_applier_status_by_coordinator\G Empty set (0.00 sec)
select * from replication_connection_configuration\G CHANNEL_NAME: HOST: 172.17.0.2 PORT: 3306 USER: rdocker NETWORK_INTERFACE: AUTO_POSITION: 1 SSL_ALLOWED: NO [ ... ]
34
in the performance_schema databaseMySQL 5.7: tables in the slave
select * from replication_connection_status\G *************************** 1. row *************************** CHANNEL_NAME: GROUP_NAME: SOURCE_UUID: f4c64510-ff4c-11e5-80f9-0242ac110002 THREAD_ID: 33 SERVICE_STATE: ON COUNT_RECEIVED_HEARTBEATS: 12 LAST_HEARTBEAT_TIMESTAMP: 2016-04-10 18:55:56 RECEIVED_TRANSACTION_SET: f4c64510-ff4c-11e5-80f9-0242ac110002:1-4 LAST_ERROR_NUMBER: 0 LAST_ERROR_MESSAGE: LAST_ERROR_TIMESTAMP: 0000-00-00 00:00:00
35
Note: we have only one piece of information related to the received transaction
in the performance_schema databaseMySQL 8.0.1: tables in the slave
36
Claim: Monitoring in crash-safe tables
‣ Claimed by ‣ MySQL 5.6, 5.7, and 8.0 ‣ MariaDB 10.0 and 10.1
37
Sceptic assessment: monitoring in crash-safe tables‣ Both: ‣ (+) Yes. The slave is crash safe ‣ (-) No replication info tables in the master ‣ (-) Split info about received and executed data
‣ MySQL 5.6, 5.7, and 8.0 ‣ (-) Lack of integration with other features ‣ (-) Only SHOW SLAVE STATUS has the full picture
‣ MariaDB 10.0 and 10.1 ‣ (-) Insufficient info for monitoring ‣ (-) Insufficient data in SHOW SLAVE STATUS
38
CAN DO MUCH, MUCH BETTER!
Multi-source replication
39
The dream of every DBA is to have a group of database servers that behave like a single server
What is it?
‣ Traditional replication allows master/slave and chain replication (a.k.a. circular or ring)
‣ Up to MySQL 5.6, a slave cannot have more than one master.
‣ Multi source is the ability of replicating from more than one master at once.
‣ Implemented in Tungsten Replicator (2009), MySQL 5.7 (2015), MariaDB 10 (2013).
40
Introduced in MySQL 5.7.7Implementation (1) MySQL 5.7
‣ New syntax: CHANGE MASTER TO … FOR CHANNEL “name”
‣ SHOW SLAVE STATUS FOR CHANNEL “name”
‣ START/STOP SLAVE FOR CHANNEL “name”
‣ Includes replication tables in performance_schema
‣ Requires GTID and crash-safe tables to be enabled
41
Setting several channelsMySQL 5.7 example
CHANGE MASTER TO MASTER_HOST='foo.example.com', MASTER_PORT=3306, MASTER_USER='repl_user', MASTER_PASSWORD='repl_pass', MASTER_AUTO_POSITION=1 for channel 'sl_foo'; START SLAVE for channel 'sl_foo';
CHANGE MASTER TO MASTER_HOST='bar.example.com', MASTER_PORT=3306, MASTER_USER='repl_user', MASTER_PASSWORD='repl_pass', MASTER_AUTO_POSITION=1 for channel 'sl_bar' START SLAVE for channel 'sl_bar';
42
Now GA, the multi source was well planned and executedimplementation (2) : MariaDB 10
‣ New syntax “CHANGE MASTER “name” …”
‣ START/STOP/RESET SLAVE “name”
‣ SHOW SLAVE “name” STATUS
‣ SHOW ALL SLAVES STATUS
43
Setting several channelsMariaDB 10.1 example
CHANGE MASTER 'sl_foo' TO MASTER_HOST='foo.example.com', MASTER_PORT=3306, MASTER_USER='repl_user', MASTER_PASSWORD='repl_pass', MASTER_USE_GTID=current_pos; START SLAVE 'sl_foo';
CHANGE MASTER 'sl_bar' TO MASTER_HOST='bar.example.com', MASTER_PORT=3306, MASTER_USER='repl_user', MASTER_PASSWORD='repl_pass', MASTER_USE_GTID=current_pos; START SLAVE 'sl_bar';
44
When the data is applied, saved to a binary log, and then replicated again, we have a full slave replay
Full slave replay (circular)
45
Allows data flow where the replicated data is applied only oncePoint-to-point replication
46
point-to-point all-masters replication
SHOW SLAVE STATUS\GMulti-source replication monitoring
## ONE REC FOR EACH MASTER [...] Retrieved_Gtid_Set: 00016003-3333-3333-3333-333333333333:1-4 Executed_Gtid_Set: 00016001-1111-1111-1111-111111111111:1-4, 00016002-2222-2222-2222-222222222222:1-4, 00016003-3333-3333-3333-333333333333:1-4
[...]
47
SHOW MASTER STATUS\GMulti-source replication monitoring
## Which set was created and which one was received?
*************************** 1. row *************************** File: mysql-bin.000001 Position: 1005 Binlog_Do_DB: Binlog_Ignore_DB: Executed_Gtid_Set: 00016001-1111-1111-1111-111111111111:1-4, 00016002-2222-2222-2222-222222222222:1-4, 00016003-3333-3333-3333-333333333333:1-4
48
Claim: Multi source replication
‣ Claimed by ‣ MySQL 5.7 ‣ MariaDB 10.0 and 10.1
49
Sceptic assessment: Multi source replication
‣ Both: ‣ (+) Yes. You can run multi-source replication; ‣ (+) SHOW SLAVE STATUS with many rows; ‣ (+) Monitoring tables with many rows ‣ (-) Mixed info about data created and received
50
CAN DO MUCH BETTER!
MySQL multi-source issues
‣ (-) Same issues for single stream, but worsened by multiple channels
‣ (+) SHOW SLAVE STATUS has a separate item for each channel.
‣ (-) GTID info is repeated as a group for every channel
‣ (-) show master status mixes up info about the data created and received
51
MariaDB multi-source issues
‣ (-) Same issues for single stream, but worsened by multiple channels
‣ (-) Syntax is different from MySQL
‣ (+) SHOW ALL SLAVES STATUS has a separate item for each channel.
‣ (-) GTID info is repeated as a group for every channel
‣ (-) GTID info in SHOW SLAVE STATUS include data created in the server.
52
Parallel replication
53
When the slave lags, using parallel threads may speed up things
Parallel apply
‣ It’s the ability of executing binary log events in parallel.
‣ Implemented in Tungsten Replication (2011, schema based), MySQL 5.6 (2012, schema based), MariaDB 10 (2013, boundless), MySQL 5.7 (2013, boundless)
54
Single vs parallel
55
The granddaddy of parallel replication, happily deployed in production for years
Implementation (1) Tungsten Replicator
‣ Based on schema boundaries.
‣ No risk of deadlocks.
‣ Can be shared by criteria other than database, but only during provisioning.
‣ Fully integrated in the instrumentation;
‣ Provides extra information for monitoring and troubleshooting
56
57
The first integrated solution for parallel replicationImplementation (2) MySQL 5.6
‣ Schema based, same as Tungsten.
‣ Requires both master and slave of the same version;
‣ No integration with GTID;
‣ No extra instrumentation.
58
Breaking the schema barriersImplementation (3) MySQL 5.7
‣ Not schema based. Parallelism is defined by extra metadata from the master (logical clock).
‣ Requires both master and slave of the same version;
‣ Uses monitoring tables in performance schema
‣ Limited troubleshooting info;
‣ With multi-source, it’s all or nothing
59
60
The latest contenderImplementation (4) MariaDB 10
‣ Not schema based. Uses information from the coordinator to define how to parallelise;
‣ Integrated with GTID;
‣ Little instrumentation for troubleshooting.
‣ You can choose to which channel to apply (set default_master_connection='x').
61
62
A new algorithm for parallel replicationNew development in MariaDB 10.1
‣ Optimistic parallelisation
‣ Does not require preparation in the master
63
Looking for performance, sometimes it's deceivingParallel replication expectations
‣ Performance depends on data distribution.
‣ Same data can have different performance on various methods.
‣ Slave resources and tuning affect reliability.
64
Claim: parallel replication
‣ Claimed by ‣ MySQL 5.6, 5.7, and 8.0 ‣ MariaDB 10.0 and 10.1
65
Sceptic assessment: parallel replication
‣ Both: ‣ (+) Yes. You can improve performance with parallel
replication; ‣ (-) There is LITTLE support for monitoring;
‣ MySQL 5.7 ‣ Some improvement in monitoring. Better info on failure
‣ MySQL 8.0.1 ‣ + info on monitoring. Split between received/executed ‣ MariaDB 10.x ‣ Terrible instrumentation: like driving in the dark
66
NEEDS BETTER METADATA!
Group replication
67
New in MySQL 5.7.17+ and 8.0.1Group replication
‣ It's the basis for High availability solutions (single master)
‣ or it can be used as an all-masters solution
68
Many changes herePrinciples
‣ SYNCHRONOUS distribution of transactions
‣ But ASYNCHRONOUS commit (with eventual rollback in case of conflict)
‣ GTID is not per server but per cluster
‣ SHOW SLAVE STATUS does not work
‣ Multi-source channels used differently (or not at all)
‣ More tables dedicated to nodes
69
performance_schema is richerAdded and removed
‣ two tables in performance_schema • replication_group_members
• replication_group_member_stats
‣ innodb cluster adds one more schema! • "mysql_innodb_cluster_metadata" with 6 tables
70
With innodb cluster we have tables in three places: * mysql * performance_schema * mysql_innodb_cluster_metadata
Supporting material and software
http://bit.ly/my-rep-samples (or check 'datacharmer' on GitHub)
71
Useful links
‣ GTID in MySQL
‣ Performance_schema tables for replication
‣ GTID in MariaDB
‣ Multi Source in MySQL
‣ Multi Source in MariaDB
‣ Parallel Replication in MariaDB
72
Q&A
73