mariadb galera cluster - percona...13 inconsistency for master-slave master node is always the...
TRANSCRIPT
MariaDB Galera ClusterDealing with Inconsistency Issues
Seppo JaakolaCodership
2
www.galeracluster.com
➢ Seppo Jaakola➢ One of the Founders of Codership
➢ Codership – Galera Replication developers➢ Partner of MariaDB for developing and supporting MariaDB
Galera Cluster➢ Galera releases since 2009
3
www.galeracluster.com
Galera Project
WSREP API
MySQLCommunity edition
Galera Replication Plugin
Galera Cluster for MySQL
WSREP API
MariaDB
merge
MariaDB Galera Cluster
R&D
4
www.galeracluster.com
MariaDB Galera Cluster
● MGC releases based on MariaDB 5.5 and 10.0
● Since MariaDB 10.1, Galera is inbuilt in MariaDB● If wsrep_provider is not specified, works as native MariaDB
● And will be present in MariaDB 10.2, 10.3...
R&D
5
www.galeracluster.com
Agenda
● Master-Slave vs Multi-Master Replication Topologies
● Galera Replication Overview● Reasons for Inconsistencies
● Bad SST● Cluster misuse
● Effect – the Harm Done● How to Detect Inconsistencies● Inconsistency Recovery● Galera Inconsistency Voting Protocol
Inconsistency in Master-Slave and Multi-Master Topologies
7
www.galeracluster.com
Master-Slave Async Replication
MariaDBMaster
MariaDBSlave
Relaylog
Binlog
MariaDBSlave
Relaylog
8
www.galeracluster.com
Master-Slave Async Replication
read & write read only
MariaDBMaster
MariaDBSlave
Relaylog
Binlog
MariaDBSlave
Relaylog
9
www.galeracluster.com
Master-Slave Async Replication
read & write read only
MariaDBMaster
MariaDBSlave
Relaylog
Binlog
MariaDBSlave
Relaylog
Master node is trusted dataSource
And is used for recovery
10
www.galeracluster.com
Master-Slave Async Replication
read & write
MariaDBMaster
MariaDBSlave
Relaylog
Binlog
MariaDBSlave
Relaylog
If slave nodes arenot used for reading,
No immediate inconsistency issues
11
www.galeracluster.com
Master-Slave Async Replication
read & write read only
MariaDBMaster
MariaDBSlave
Relaylog
Binlog
MariaDBSlave
Relaylog
If slave nodes get readsBut reads are not used for
anything critical,No problems for business logic
12
www.galeracluster.com
Master-Slave Async Replication
MariaDBMaster
MariaDBSlave
Relaylog
Binlog
MariaDBSlave
Relaylog
If reads from slave nodes areused as input for writes,
Inconsistency may violate business logic
read only write
13
www.galeracluster.com
Inconsistency for Master-Slave
● Master node is always the trusted data source● Recovery from inconsistency will be straight forward
● If slave nodes are not used for reading, inconsistency issue does not cause immediate problem
● If slaves are used for reading, application will get false data
● But this is safe, is these reads are not used for anything critical
● If slave read results are trusted, and used as input for later write operations business logic, the application business logic will be violated as well
14
www.galeracluster.com
MariaDB MariaDB MariaDB
Multi-Master Replication
Galera ReplicationReplication is synchronous
read & write read & write read & write
Client can connect to any node
There can be several nodes
Read & write access to any node
15
www.galeracluster.com
MariaDB
Multi-Master Replication
a
Multi-master cluster looks like one big database with multiple entry points
read & write read & write read & write
16
www.galeracluster.com
MariaDB
Multi-Master Replication
a
Adding more nodesOpens new connectionports
read & write read & writeread & write read & write
18
www.galeracluster.com
Inconsistency for Multi-Master
● The effect of inconsistency more prominent● Inconsistency may immediately violate the business logic
19
www.galeracluster.com
MariaDB MariaDB MariaDB
Inconsistency for Multi-Master
Galera Replication
read & write read & write read & write
Reads from compromised node will immediately hurt business logic
20
www.galeracluster.com
MariaDB MariaDB MariaDB
Inconsistency for Multi-Master
Galera Replication
read & write read & write read & write
Reads from compromised node will immediately hurt business logic
Reasons for Inconsistency
23
www.galeracluster.com
Reasons for Inconsistency
● Bad SST● Full cluster Snapshot State Transfer is external operation
(to cluster), which happens through SST API● Cluster picks a donor node to help the joining node to get
same state as active cluster● 3 main variants: mysqldump, rsync and xtrabackup based
versions, (and more to come, MariaBackup)● Improper SST leaves joining node with inconsistent
database
● Cluster misuse● wsrep_on session and global variable, requires SUPER
privileges, tells if wsrep replication plugin should replicate the transaction or not
● sql_log_bin session variable, skips binlogging for the session in master node. With no binlog events, Galera has nothing to replicate
24
www.galeracluster.com
Reasons for Inconsistency
● Replication filtering● binlog_do* No binlogging in master node● replicate_do* slave node does not apply incoming events
● Slave lag● Not real inconsistency, but replication may be slow● if reads happen from too old version of data, the
consequence for application is same as having inconsistent database
● Bug
25
www.galeracluster.com
Consistent Reads from a Cluster
● Galera Cluster orders transactions and forces them to commit in strictly same order in every cluster node
● Cluster has builtin flow control, which makes nodes’ commit pace to be very close to each other
● gcs.fc_limit, gcs.fc_factor
● If client has only one connection (to one node), it will see and operate with safe data access
26
www.galeracluster.com
MariaDB MariaDB
Consistent Reads from a Cluster
403938 434241 464544 494847
27
www.galeracluster.com
Consistent Reads from a Cluster
● If client has several connections, or connects through proxy or load balancer, which can direct client’s connections to several cluster nodes, there may happen read causality issue
28
www.galeracluster.com
MariaDB MariaDB
Read Causality
Galera Replication Replication is “synchronous”
INSERT INTO mydata
INS
29
www.galeracluster.com
MariaDB MariaDB
Read Causality
Galera Replication
INSERT INTO mydata SELECT FROM mydata
INS
30
www.galeracluster.com
MariaDB MariaDB
Read Causality
Galera Replication
Causality between INSERT and SELECT
INSERT INTO mydata SELECT FROM mydata
INS
31
www.galeracluster.com
MariaDB MariaDB
Read Causality
Galera Replication
Proxy may use several connectionsAnd trigger read causality
INSERT INTO mydata SELECT FROM mydata
INS
MaxScale
32
www.galeracluster.com
Read Causality
● Can be avoided by wsrep_sync_wait
● Makes session to wait for slave queue flushing before issuing next read
● Sync wait can be applied for:● Select, show, begin, update, delete
● Makes session’s all reads somewhat slower, should be used only when read causality is really needed
● Read causality may look like inconsistency, at first glance, but however, cluster is totally consistent
33
www.galeracluster.com
Triggers, Stored Procedures, Events
● Triggers fire in master node only, and data manipulation effects will be replicated as ROW events
● Same applies for Stored Procedures, they are processed in master node only
● Events should be carefully used. They will execute in all nodes, where they are enabled
● Managing in which node event should be enabled or slaveside_disabled, may be error prone, especially when nodes drop and join the cluster often
34
www.galeracluster.com
Foreign Key Constraint
● FK constraints add new level for data coherence in the cluster
● Galera enforces FK constraints in all cluster nodes, and any problem in constraint validation will lead to node emergency shutdown
● Supporting FK constraints in Galera replication was a major effort in Galera release 2 (year 2011-2012)
● FK violation in slave node, shows up as cluster inconsistency
35
www.galeracluster.com
Async Replication
● Galera Cluster node can operate as slave node for external MariaDB master
● Incoming native replication is treated like any client connection
● However, it is possible to configure async replication to be “pre-ordered” replication, and this type of replication stream, is trusted and no certification is performed
● This may be a vulnerability for consistency, if not carefully used
● See: wsrep_preordered
The Effect of Inconsistency
37
www.galeracluster.com
The Harm Done
● MariaDB asynchronous replication slave thread stops for an error● But, it can be configured to accept certain errors in replication
● e.g. with: --slave-skip-errors● Note that, if replication errors are not dealt with, slaves
may become inconsistent and the inconsistency just gets worse by the time
● Galera is very strict about inconsistency:● Errors in DDL are neglected● Any error in regular DML transaction replication will
cause emergency abort for the node● Errors also in foreign key constraint validation, will cause
emergency abort
38
www.galeracluster.com
MariaDB MariaDB MariaDB
Inconsistency for Multi-Master
Galera Replication
read & write read & write read & write
Error in applying will trigger emergency abort
WSWS
39
www.galeracluster.com
MariaDB MariaDB MariaDB
Inconsistency for Multi-Master
Galera Replication
read & write read & write
Error in applying will trigger emergency abort
40
www.galeracluster.com
MariaDB
Inconsistency for Multi-Master
Galera Replication
Every failed write set will be logged in GRA_x_y.dat log file x=thread ID y=seqno
WS
Binlogevents
GRA_x_y.dat
41
www.galeracluster.com
Could Some Errors Be Neglected?
● Failure to delete a row, when the row is already deleted
● Failure to insert a row, when exactly same row already exists
● In failing node, we should allow reads● This requirement is implemented by
wsrep_dirty_reads
42
www.galeracluster.com
Could Some Errors Be Neglected?
● Application implements certain business logic
● There are reads, to get knowledge of the current state, and based on results of reads, application does certain write operations
Begin Select… Select… Select… Update… Update…Commit
reads
writes
Business logic
43
www.galeracluster.com
Could Some Errors Be Neglected?
● Any replication error is a sign of data inconsistency● Even a delete of non existing row
● Inconsistency may be anything between a fresh event or a hidden inconsistency from system startup time
● Detecting an inconsistency is always “after the fact”,business logic may have been violated for ages already
● Therefore strict measures are needed, dirty data may not be used anymore
44
www.galeracluster.com
Could Some Errors Be Neglected?
● Should failing node stay in the cluster instead of shutting down?
● Should we allow reads from non primary node?● This is a long term wish, and now supported by option:
wsrep_dirty_reads ● Is it wise to read from non primary node?
● No● Why DDL errors are neglected?
● Galera replicates DDL up-front as verbatim SQL statement
● Some application frameworks generate impossible DDL on regular basis, like DROP TABLE <non-existent>
45
www.galeracluster.com
About Foreign Key Constraint
● FK constraint adds extra discipline for database content
● If data inconsistency is about to happen, FK constraints may make it surface earlier
● Supporting FK constraints, in Galera Cluster is somewhat complicated, because:
● Galera allows multi-master access● Replication slaves are highly parallel
● For this, we have to carefully control, when and how FK constraints are enforced, and what kind of multi-master operation must be rejected as FK violation
Recovering from Inconsistency
47
www.galeracluster.com
How to Recover from Inconsistency
● Practically only SST helps
● However, troubleshooting analysis is needed for figuring out which node has the most reliable data
● Also, the reason for the inconsistency should be found out
● This may help to prevent that similar problem would happen in future
● Unfortunately, analysis of inconsistency is probably the hardest problem with database clusters
● There is no indication of when the inconsistency was brought to the system
● And, often there are no signs in the logs about the incdent which caused the inconsistency
48
www.galeracluster.com
How to Recover from Inconsistency
● Can you tune your database consistent by tolerating the applying errors and hoping that RBR events will eventually write over bad data?
● In multi-master topology, absolutely NO
● In master-slave topology, some good progress might happen
● But not necessarily● Delete and inserts would work correctly● But updates, which cannot match ROW before image
would fail● And reads from this node would be always a compromise
Detecting Inconsistencies
50
www.galeracluster.com
Detecting Inconsistencies up-front
● To prevent sudden inconsistency emergencies, it would be good to harvest the database before hand for inconsistencies
● Early inconsistency detection would:• Reduce the harm of the inconsistency for business logic● Allow better time for dealing with the issue● However, the found inconsistency is equally bad to
troubleshoot● Also, the constant database scanning takes some
resources, and can itself cause issues for the cluster use● But how can you check data consistency in a live cluster, with transactions processing in high speed
51
www.galeracluster.com
Galera Consistency Checking
● Galera has inbuilt a method for running consistency checking in isolated transaction sequence slot
● Consistency checking support is performed for statements:
• INSERT...SELECT ● REPLACE...SELECT● Having special version comment /* !99997 */
CREATE TABLE consistency (id INT AUTO_INCREMENT, checksum VARCHAR(32), … )
INSERT INTO consistency (checksum,...) SELECT MD5(GROUP_CONCAT(i,j)) FROM mytable /* !99997 */
52
www.galeracluster.com
Galera Consistency Checking
MariaDB MariaDB MariaDB
Galera Replication
read & write read & write read & write
WS947
WS948
WS949
WS950
WS951
WS952
WS953
WS945
WS946
53
www.galeracluster.com
Galera Consistency Checking
MariaDB MariaDB MariaDB
Galera Replication
read & write read & write read & write
WS956
WS948
WS949
WS950
WS951
WS952
WS953
WS954
WS955
INSERT...SELECT… /* !99997 */
54
www.galeracluster.com
Galera Consistency Checking
MariaDB MariaDB MariaDB
Galera Replication
read & write read & write read & write
WS956
WS957
WS958
WS959
WS960
WS961
WS962
WS954
WS955
INSERT...SELECT… /* 99997 */
WS963
WS964
WS965
55
www.galeracluster.com
Galera Consistency Checking
MariaDB MariaDB MariaDB
Galera Replication
read & write read & write read & write
WS956
WS957
WS958
WS959
WS960
WS961
WS962
WS955
WS963
WS964
WS965
56
www.galeracluster.com
Galera Consistency Checking
● The INSERT...SELECT is run in total order isolation (TOI)
● It will block all commits until the INSERT..SELECT has completed
● This may have impact on overall transaction throughput, avoid too large select result set
● Pt-table-checksum tool uses Galera consistency checking support, and scans automatically all tables for inconsistencies
Optimizing Inconsistency Shutdown
58
www.galeracluster.com
Optimizing Inconsistency Shutdown
● Current Policy for Inconsistency:● For suspected inconsistency, cluster node will do emergency shutdown
● However, DDL failures are logged only as warnings
● Injected inconsistency in one node can cause all other nodes to shutdown
59
www.galeracluster.com
Inconsistency Shutdown
Galera Replication
Node A Node B Node C
Create table t1 (i int)
60
www.galeracluster.com
Inconsistency Shutdown
Galera Replication
Node A Node B Node C
t1 t1 t1
Create table t1 (i int)
61
www.galeracluster.com
Inconsistency Shutdown
Galera Replication
Node A Node B Node C
Set wsrep_on=OFFInsert into t values (8)
t1 t1 t1
8
62
www.galeracluster.com
Inconsistency Shutdown
Node A Node B Node C
Set wsrep_on=ONDelete from t;
t1 t1 t1
8
Del 8
Del 8
63
www.galeracluster.com
Inconsistency Shutdown
Node A Node B Node C
t1 t1 t1
Set wsrep_on=ONDelete from t;
Del 8
Del 8
64
www.galeracluster.com
Inconsistency Shutdown
Node A Node B Node C
t1 t1 t1
Node A, remains in minority and changes to
Non-Primary
65
www.galeracluster.com
Optimizing Consistency Shutdown
● Codership has developed a method to optimize node emergency shutdowns due to suspected inconsistency, to happen only for minimal set of compromised nodes
● When inconsistency is observed, nodes will communicate through consistency voting protocol to compare which nodes face the similar issue in applying
● Target is to find the majority, which have same understanding about the fate of the offending write set
66
www.galeracluster.com
Inconsistency Shutdown
Node A Node B Node C
8t1 t1 t1
Set wsrep_on=ONDelete from t;
Del 8
Del 8
Consistency Voting
67
www.galeracluster.com
Consistency Voting
Node A Success
Node B Error: ‘row not found’
Node C Error: ‘row not found’
68
www.galeracluster.com
Inconsistency Shutdown
Node A Node B Node C
t1 t1 t1
Consistency Voting
69
www.galeracluster.com
Galera Consistency Voting Protocol
● With consistency voting, Galera Cluster can mitigate the harm of inconsistency for the cluster
● In the best case, only one node has to abort, and majority can continue operating normally
● However, the database has been inconsistent, for indefinitely long period, and application business logic may have been hurt
70
www.galeracluster.com
Summary
● Database inconsistency is a critical problem and very hard to track down
● It affects both MS and MM topologies● But, inconsistency problem is immediate in MM
● It is possible to scan Galera Clusters for inconsistencies up-front
● Recovery requires full database copy
● Galera Consistency Voting protocol will help, but only in saving the cluster to remain in service
74
www.galeracluster.com
4
Happy Clustering :-)Thank you for listening!