mysql replication

15
Copyright © 2006-2014 Percona LLC MySQL Replication By Stephane Combaudon, Aurimas Mikalauskas and Muhammad Irfan MySQL's built-in replication is the foundation for building large, high-performance applications on top of MySQL using the "scale-out" architecture. MySQL replication enables DBAs to configure one or more server as replicas of another server - keeping the data synced with the master copy. This Percona eBook answers the questions: * "How does MySQL Replication really work?" * "How to identify and cure MySQL replication slave lag" * "GTIDs in MySQL 5.6: New replication protocol; new ways to break replication"

Upload: dbaenk

Post on 03-Oct-2015

7 views

Category:

Documents


0 download

DESCRIPTION

MySQL servers replication manual

TRANSCRIPT

  • Copyright 2006-2014 Percona LLC

    MySQL ReplicationBy Stephane Combaudon, Aurimas Mikalauskas and

    Muhammad Irfan

    MySQL's built-in replication is the foundation for building large, high-performance applications on top of MySQL using the "scale-out" architecture. MySQL replication enables DBAs to configure one or more server as replicas of another server - keeping the data synced with the master copy.

    This Percona eBook answers the questions: * "How does MySQL Replication really work?"* "How to identify and cure MySQL replication slave lag"* "GTIDs in MySQL 5.6: New replication protocol; new ways to break replication"

  • MySQL Replication

    36

    Chapter 1: How does MySQL Replication really work?Chapter 2: How to identify and cure MySQL replication slave lagChapter 3: GTIDs in MySQL 5.6: New replication protocol; new ways to break replication 11

    Table of Contents

    Percona was founded in August 2006 and now employs a global network of experts with a staff of more than 120 people. Our customer list is large and diverse, including Fortune 50 corporations, popular websites, and small startups. We have over 2,000 customers and, although we do not reveal all of their names, chances are we're working with nearly every large MySQL user you've heard about. To put Percona's MySQL expertise to work for you, please contact us.

    About Percona

    Skype: oncall.perconaGTalk: [email protected] (AOL Instant Messenger): oncallpercona Telephone direct-to-engineer: +1-877-862-4316 or UK Toll Free: +44-800-088-5561Telephone to live operator: +1-888-488-8556 Customer portal: https://customers.percona.com/

    Copyright 2006-2014 Percona LLC

    Is this an emergency? Get immediate assistance from Percona Support 24/7. Click here

    http://www.percona.com/contact/24x7-emergencyhttp://www.percona.com/about-perconahttp://www.percona.com/contact/sales

  • MySQL ReplicationChapter 1: How does MySQL Replication

    How does MySQL Replication really work?

    While Percona experts have written in-depth on MySQL replication on our MySQL PerformanceBlog, such as on replication being single-threaded, on semi-synchronous replication or on estimating replication capacity, I dont think weve ever covered the very basics of how MySQLreplication really works. Thats one of the main reasons for publishing this eBook on the subject.

    Of course, there are many aspects of MySQL replication, but my main focus will be the logistics how replication events are written on the master, how they are transferred to the replication slaveand then how they are applied there. Note that this is NOT a HOWTO setup replication, but rathera type of thing.

    Replication events

    I say replication events in this article because I want to avoid discussion about different replicationformats. These are covered pretty well . Put simply, the events can be one of two types:

    Statement based in which case these are write queriesRow based in this case these are changes to records, sort of row diffs if you will

    But other than that, I wont be going back to differences in replication with different replicationformats, mostly because theres very little thats different when it comes to transporting the datachanges.

    On the master

    So now let me start with what is happening on the master. For replication to work, first of all masterneeds to be writing replication events to a special log called binary log. This is usually verylightweight activity (assuming events are not synchronized to disk), because writes are bufferedand because they are sequential. The binary log file stores data that replication slave will bereading later.

    Whenever a replication slave connects to a master, master creates a new thread for the connection(similar to one thats used for just about any other server client) and then it does whatever theclient replication slave in this case asks. Most of that is going to be (a) feeding replication slavewith events from the binary log and (b) notifying slave about newly written events to its binary log.

    Slaves that are up to date will mostly be reading events that are still cached in OS cache on themaster, so there is not going to be any physical disk reads on the master in order to feed binary log

    3

    By Aurimas Mikalauskas

    http://www.percona.com/blog/http://www.percona.com/blog/http://www.percona.com/blog/2010/10/20/mysql-limitations-part-1-single-threaded-replication/http://www.percona.com/blog/2012/01/19/how-does-semisynchronous-mysql-replication-work/http://www.percona.com/blog/2010/07/20/estimating-replication-capacity/

  • MySQL ReplicationChapter 1: How does MySQL Replication

    events to slave(s). However, when you connect a replication slave that is few hours or even daysbehind, it will initially start reading binary logs that were written hours or days ago master may nolonger have these cached, so disk reads will occur. If master does not have free IO resources, youmay feel a bump at that point.

    On the replica

    Now lets see what is happening on the slave. When you start replication, two threads are startedon the slave:

    1. IO thread

    This process called IO thread connects to a master, reads binary log events from the master asthey come in and just copies them over to a local log file called relay log. Thats all.

    Even though theres only one thread reading binary log from the master and one writing relay logon the slave, very rarely copying of replication events is a slower element of the replication. Therecould be a network delay, causing a steady delay of few hundred milliseconds, but thats about it.

    If you want to see where IO thread currently is, check the following in show slave statusG:

    Master_Log_File last file copied from the master (most of the time it would be the sameas last binary log written by a master)Read_Master_Log_Pos binary log from master is copied over to the relay log on theslave up until this position.

    And then you can compare it to the output of show master statusG from the master.

    2. SQL thread

    The second process SQL thread reads events from a relay log stored locally on the replicationslave (the file that was written by IO thread) and then applies them as fast as possible.

    This thread is what people often blame for being single-threaded. Going back to show slavestatusG, you can get the current status of SQL thread from the following variables:

    Relay_Master_Log_File binary log from master, that SQL thread is working on (inreality it is working on relay log, so its just a convenient way to display information)Exec_Master_Log_Pos which position from master binary log is being executed by SQLthread.

    4

  • MySQL ReplicationChapter 1: How does MySQL Replication

    Now I want to briefly touch the subject of replication lag in this context. When you are dealing withreplication lag, first thing you want to know is which of the two replication threads is behind. Most ofthe time it will be the SQL thread, still it makes sense to double check. You can do that bycomparing the replication status variables mentioned above to the master binary log status fromthe output of show master statusG from the master.

    If it happens to be IO thread, which, as I mentioned many times already, is very rare, one thing youmay want to try to get that fixed is .

    Otherwise, if you are sure it is SQL thread, then you want to understand what is the reason andthat you can usually observe by vmstat. Monitor server activity over time and see if it is r or bcolumn that is scoring most of the time. If it is r, replication is CPU-bound, otherwise IO. If itis not conclusive, mpstat will give you better visibility by CPU thread.

    Note this assumes that there is no other activity happening on the server. If there is some activity,then you may also want to look at diskstats or even do a query review for SQL thread to get a goodpicture.

    If you find that replication is CPU bound, this maybe very helpful.

    If it is IO bound, then fixing it may not be as easy (or rather, as cheap). Let me explain. If replicationis IO bound, most of the time that means that SQL thread is unable to read fast enough becausereads are single threaded. Yes, you got that right it is reads that are limiting replicationperformance, not writes. Let me explain this further.

    Assume you have a RAID10 with a bunch of disks and write-back cache. Writes, even though theyare serialized, will be fast because they are buffered in the controller cache and because internallyRAID card can parallelize writes to disks. Hence replication slave with similar hardware can writejust as fast as master can.

    Now Reads. When your workset does not fit in memory, then the data that is about to get modifiedis going to have to be read from disk first and this is where it is limited by the single-threadednature of the replication, because one thread will only ever read from one disk at a time.

    That being said, one solution to fix IO-bound replication is to increase the amount of memory soworking set fits in memory. Another get IO device that can do much more IO operations persecond even with a single thread fastest traditional disks can do up to 250 iops, SSDs in theorder of 10,000 iops.

    In Chapter Two, Percona support engineer Muhammad Irfan will talk about identifying and curingMySQL replication slave lag.

    5

    Replication lag

    http://www.percona.com/doc/percona-toolkit/2.1/pt-diskstats.htmlhttp://www.percona.com/blog/2008/09/22/fighting-mysql-replication-lag/

  • MySQL ReplicationChapter 2: How to identify and cure MySQL replication slave lag

    How to identify and cure MySQL replication slave lag

    Here on the Percona MySQL Support team, we often see issues where acustomer is complaining about replication delays and many times the problem ends up being tiedto MySQL replication slave lag. This of course is nothing new for MySQL users and weve had afew posts here on the MySQL Performance Blog on this topic over the years (two particularlypopular post in the past were: Reasons for MySQL Replication Lag and Managing Slave Lagwith MySQL Replication, both by Percona CEO Peter Zaitsev).

    In this chapter, however, I will share some new ways of identifying delays in replication includingpossible causes of lagging slaves and how to cure this problem.

    How to identify Replication DelayMySQL replication works with two threads, IO_THREAD & SQL_THREAD. IO_THREAD connectsto a master, reads binary log events from the master as they come in and just copies them over toa local log file called relay log. On the other hand, SQL_THREAD reads events from a relay logstored locally on the replication slave (the file that was written by IO thread) and then applies themas fast as possible. Whenever replication delays, its important to discover first whether itsdelaying on slave IO_THREAD or slave SQL_THREAD.

    Normally, I/O thread would not cause a huge replication delay as it is just reading the binary logsfrom the master. However, It depends on the network connectivity, network latency how fast isthat between the servers. The Slave I/O thread could be slow because of high bandwidth usage.Usually, when the slave IO_THREAD is able to read binary logs quickly enough it copies and pilesup the relay logs on the slave which is one indication that the slave IO_THREAD is not the culpritof slave lag.

    On the other hand, when the slave SQL_THREAD is the source of replication delays it is probablybecause of queries coming from the replication stream are taking too long to execute on the slave.This is sometimes because of different hardware between master/slave, different schema indexes,workload. Moreover, the slave OLTP workload sometimes causes replication delays because oflocking. For instance, if a long-running read against a MyISAM table blocks the SQL thread, or anytransaction against an InnoDB table creates an IX lock and blocks DDL in the SQL thread. Also,

    6

    By Muhammad Irfan

    take into account that slave is single threaded prior to MySQL 5.6, which would be another reasonfor delays on the slave SQL_THREAD.

    Let me show you via master status/slave status example to identify either slave is lagging on slaveIO_THREAD or slave SQL_THREAD on the following page.

    http://www.percona.com/products/mysql-supporthttp://www.percona.com/blog/2011/07/29/reasons-for-mysql-replication-lag/http://www.percona.com/blog/2007/10/12/managing-slave-lag-with-mysql-replication/http://www.percona.com/blog/2007/10/12/managing-slave-lag-with-mysql-replication/

  • MySQL ReplicationChapter 2: How to identify and cure MySQL replication slave lag

    7

  • MySQL ReplicationChapter 2: How to identify and cure MySQL replication slave lag

    mysql-bin.018192 (Relay_Master_Log_File from slave status) This indicates that the slaveSQL_THREAD is applying events fast enough, but its lagging too, which can be observed fromthe difference between Read_Master_Log_Pos & Exec_Master_Log_Pos from show slavestatus output.

    You can calculate slave SQL_THREAD lag from Read_Master_Log_Pos Exec_Master_Log_Pos in general as long as Master_Log_File parameter output from showslave status and Relay_Master_Log_File parameter from show slave status output are the same.This will give you rough idea how fast slave SQL_THREAD is applying events. As I mentionedabove, the slave IO_THREAD is lagging as in this example then off course slave SQL_THREAD isbehind too. You can read detailed description of show slave status output fields

    Also, the Seconds_Behind_Master parameter shows a huge delay in seconds. However, this canbe misleading, because it only measures the difference between the timestamps of the relay logmost recently executed, versus the relay log entry most recently downloaded by the IO_THREAD.If there are more binlogs on the master, the slave doesnt figure them into the calculation ofSeconds_behind_master. You can get a more accurate measure of slave lag using pt-heartbeatfrom Percona Toolkit. So, we learned how to check replication delays either its slaveIO_THREAD or slave SQL_THREAD. Now, let me provide some tips and suggestions for whatexactly causing this delay.

    Tips and Suggestions What Causing Replication Delay & Possible FixesUsually, the slave IO_THREAD is behind because of slow network between master/slave. Most ofthe time, enabling helps to mitigate slave IO_THREAD lag. One other suggestion is to disablebinary logging on slave as its IO intensive too unless you required it for point in time recovery.

    To minimize slave SQL_THREAD lag, focus on query optimization. My recommendation is toenable the configuration option so that the queries executed by slave that take more than will belogged to the slow log. To gather more information about query performance, I would alsorecommend setting the configuration option log_slow_verbosity to full.

    This way we can see if there are queries executed by slave SQL_thread that are taking long time tocomplete. You can follow my previous post about how to enable slow query log for specific timeperiod with mentioned options here. And as a reminder, log_slow_slave_statements as variablewere first introduced in Percona Server 5.1 which is now part of vanilla MySQL from version 5.6.11In upstream version of MySQL Server log_slow_slave_statements were introduced as commandline option. Details can be found here while log_slow_verbosity is Percona Server specific feature.

    8

    This clearly suggests that the slave IO_THREAD is lagging and obviously because of that the slaveSQL_THREAD is lagging, too, and it yields replication delays. As you can see the Master log file is mysql-bin.018196 (File parameter from master status) and slave IO_THREAD ison mysql-bin.018192 (Master_Log_File from slave status) which indicates slave IO_THREAD isreading from that file, while on master its writing on mysql-bin.018196, so the slave IO_THREADis behind by 4 binlogs. Meanwhile, the slave SQL_THREAD is reading from same file i.e.

    http://www.percona.com/doc/percona-toolkit/2.2/pt-heartbeat.htmlhttp://www.percona.com/software/percona-toolkithttp://www.percona.com/doc/percona-server/5.1/diagnostics/slow_extended.html?id=percona-server:features:slow_extended_51&redirect=2#log_slow_verbosityhttp://www.percona.com/blog/2014/03/14/tools-and-tips-for-analysis-of-mysqls-slow-query-log/http://www.percona.com/software/percona-serverhttp://www.percona.com/doc/percona-server/5.1/diagnostics/slow_extended.html?id=percona-server:features:slow_extended_51&redirect=2#log_slow_slave_statements

  • MySQL ReplicationChapter 2: How to identify and cure MySQL replication slave lag

    One improvement is made for this case in MySQL 5.6, where in memory hash is used comes tothe rescue.

    Note that Seconds_Behind_Master is not updated while we read huge RBR event, So, laggingmay be related to just that we had not completed reading of the event. For example, in row basedreplication huge transactions may cause delay on slave side e.g. if you have 10 million rows tableand you do DELETE FROM table WHERE id 5M rows will be sent to slave, each row separatelywhich will be painfully slow. So, if you have to delete oldest rows time to time from huge table usingPartitioning might be good alternative for this for some kind of workloads where instead usingDELETE use DROP old partition may be good and only statement is replicated because it will beDDL operation.

    To explain it better, let suppose you have partition1 holding rows of IDs from 1 to 1000000, partition2 IDs from 1000001 to 2000000 and so on so instead of deleting via statementDELETE FROM table WHERE ID /dev/null mysqld_alive=$? if [[ $mysqld_alive == 0 ]] thenseconds_behind_master=$(mysql $EXT_ARGV -e "show slave status" --vertical | grepSeconds_Behind_Master | awk '{print $2}') echo $seconds_behind_master else echo 1 fi } #Uncomment below to test that trg_plugin function works as expected #trg_plugin ------- -- That'sthe pt-plug.sh file you would need to create and then use it as below with pt-stalk: $/usr/bin/pt-stalk --function=/root/pt-plug.sh --variable=seconds_behind_master --threshold=300--cycles=60 [email protected] --log=/root/pt-stalk.log--pid=/root/pt-stalk.pid --daemonize

    You can adjust the threshold, currently its 300 seconds, combining that with cycles, it means thatif seconds_behind_master value is >= 300 for 60 seconds or more then pt-stalk will start capturingdata. Adding notify-by-email option will notify via email when pt-stalk captures data. You canadjust the pt-stalk thresholds accordingly so thats how it triggers to collect diagnostic data duringproblem.

    9

    One other reason for delay on the slave SQL_THREAD if you use row-based binlog format is that if any database table is missing a primary key or unique key then it will scan all rows of the table for DML on slave and causes replication delays. Make sure all your tables have a primary keyor unique key. Check this bug report for details. You can use the query below on the slave to identify which database tables are missing primary or unique keys.

    http://www.percona.com/doc/percona-toolkit/2.2/pt-stalk.html#cmdoption-pt-stalk--notify-by-emailhttp://bugs.mysql.com/bug.php?id=53375

  • MySQL ReplicationChapter 2: How to identify and cure MySQL replication slave lag

    ConclusionA lagging slave is a tricky problem but a common issue in MySQL replication. Ive tried to covermost aspects of replication delays in this post. Please share in the comments section if you know ofany other reasons for replication delay.

    10

  • MySQL ReplicationChapter 3: GTIDs in MySQL 5.6: New replication protocol; new ways to break replication

    GTIDs in MySQL 5.6: New replication protocol; new ways to break replication

    One of the MySQL 5.6 features many people are interested in is Global Transactions IDs (. This isfor a good reason: Reconnecting a slave to a new master has always been a challenge while it isso trivial when GTIDs are enabled. However, using GTIDs is not only about replacing good oldbinlog file/position with unique identifiers, it is also using a new replication protocol. And if you arenot aware of it, it can bite.

    Replication protocols: old vs new

    The old protocol is pretty straightforward: the slave connects to a given binary log file at a specificoffset, and the master sends all the transactions from there.

    The new protocol is slightly different: the slave first sends the range of GTIDs it has executed, andthen the master sends every missing transaction. It also guarantees that a transaction with a givenGTID can only be executed once on a specific slave.

    In practice, does it change anything? Well, it may change a lot of things. Imagine the followingsituation: you want to start replicating from trx 4, but trx 2 is missing on the slave for some reason.

    With the old replication protocol, trx 2 will never be executed while with the new replicationprotocol, it WILL be executed automatically.

    Here are 2 common situations where you can see the new replication protocol in action.

    Skipping transactions

    11

    By Stephane Combaudon

    http://www.percona.com/blog/wp-content/uploads/2014/04/new_protocol.png

  • MySQL ReplicationChapter 3: GTIDs in MySQL 5.6: New replication protocol; new ways to break replication

    It is well known that the good old SET GLOBAL sql_slave_skip_counter = N is no longer supportedwhen you want to skip a transaction and GTIDs are enabled. Instead, to skip the transaction with GTID XXX:N, you have to inject an empty transaction:

    Why cant we use sql_slave_skip_counter? Because of the new replication protocol!

    Imagine that we have 3 servers like the picture below:

    Lets assume that sql_slave_skip_counter is allowed and has been used on S2 to skip trx 2. Whathappens if you make S2 a slave of S1?

    12

    Skipping transactions

    http://www.percona.com/blog/2013/03/26/repair-mysql-5-6-gtid-replication-by-injecting-empty-transactions/http://www.percona.com/blog/wp-content/uploads/2014/05/new_protocol2.png

  • MySQL ReplicationChapter 3: GTIDs in MySQL 5.6: New replication protocol; new ways to break replication

    If trx 2 is still in the binary logs of S1, it will be sent to S2, and the transaction is no longerskipped.If trx 2 no longer exists in the binary logs of S1, you will get a replication error.

    This is clearly not safe, thats why sql_slave_skip_counter is not allowed with GTIDs. The onlysafe option to skip a transaction is to execute a fake transaction instead of the real one.

    Errant transactions

    If you execute a transaction locally on a slave (called errant transaction in the MySQLdocumentation), what will happen if you promote this slave to be the new master?

    With the old replication protocol, basically nothing (to be accurate, data will be inconsistentbetween the new master and its slaves, but that can probably be fixed later).

    With the new protocol, the errant transaction will be identified as missing everywhere and will beautomatically executed on failover, which has the potential to break replication.

    Lets say you have a master (M), and 2 slaves (S1 and S2). Here are 2 simple scenarios wherereconnecting slaves to the new master will fail (with different replication errors):

    # Scenario 1

    13

    Both servers will exchange the range of executed GTIDs, and S1 will realize that it has to send trx 2 to S2. Two options then:

  • MySQL ReplicationChapter 3: GTIDs in MySQL 5.6: New replication protocol; new ways to break replication

    # Scenario 2

    As you can understand, errant transactions should be avoided with GTID-based replication. If youneed to run a local transaction, your best option is to disable binary logging for that specificstatement:

    14

    ConclusionGTIDs are a great step forward in the way we are able to reconnect replicas to other servers. Butthey also come with new operational challenges. If you plan to use GTIDs, make sure you correctlyunderstand the new replication protocol, otherwise you may end up breaking replication in new andunexpected ways.

    Ill do more exploration about errant transactions in a future post on Percona's MySQLPerformance Blog. I invite you to join me there for a conversation on the topic.

    http://www.percona.com/blog/http://www.percona.com/blog/

  • MySQL ReplicationChapter 3: GTIDs in MySQL 5.6: New replication protocol; new ways to break replication

    Powered by TCPDF (www.tcpdf.org)

    Copyright 2006-2014 Percona LLC

    About the authors

    During his career, both as a system administrator and a performance engineer, Percona architect Aurimas Mikalauskas has become familiar with many different technologies and how to apply them. He always knows the right tools for the task. In addition to MySQL performance optimization, his key areas of expertise include: MySQL High Availability, full text search, web and content cache servers, and MySQL data recovery.

    Percona architect Stphane Combaudon joined Percona in July 2012 after working as a MySQL DBA for leading French companies. In real life, he lives in Paris with his wife and their twin daughters. When not in front of a computer or not spending time with his family, he likes playing chess and hiking.

    Muhammad Irfan is vastly experienced in the LAMP Stack. He joined Percona in December 2012 as Support Engineer. Prior to joining Percona, he worked in the role of MySQL DBA & LAMP Administrator, maintained high traffic websites, and worked as a Consultant. His professional interests focus on MySQL scalability and on performance optimization. In his spare time, he normally spends time with family and friends and loves to play and watch cricket.

    Visit Percona's ever-growing MySQL eBook library for more great content at http://www.percona.com/resources/mysql-ebooks

    http://www.tcpdf.orghttp://www.percona.com/resources/mysql-ebooks

    Table of ContentsMySQL ReplicationHow does MySQL Replication really work?How to identify and cure MySQL replication slave lagGTIDs in MySQL 5.6: New replication protocol; new ways to break replication

    Untitled