con5409_jadhav-yahoo case study- mysql gtids and parallel or multithreaded replication

Upload: sbabuind

Post on 06-Jul-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    1/57

    Yahoo Case Study: MySQL GTIDs and Parallel orMultithreaded Replication

    PRESENTED BY Stacy Yuan, Yashada Jadhav October 2015

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    2/57

     About Yahoo

    !  Yahoo is focused on making the world’s daily habits inspiring and

    entertaining.

    !  By creating highly personalized experiences for our users, we keep people

    connected to what matters most to them, across devices and around theworld.

    !  In turn, we create value for advertisers by connecting them with the

    audiences that build their businesses

    !  More than 1B monthly active users across Yahoo and Tumblr

    More than 575M mobile monthly active users across Yahoo and Tumblr

    Yahoo Case Study: MySQL GTIDs and Parallel orMultithreaded Replication 

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    3/57

     Ad Products Team

    Mission Statement: Delivering scalable and cost efficient data services

    through innovation and automation powering Yahoo Products

    !  Thousands of Production Servers

    !  OLTP systems & Data marts

    !  Database Design and Architecture

    !  Capacity Planning and Performance Reviews

    !  24x7 Monitoring and Operational Support

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    4/57

    MySQL at Yahoo

    !  MySQL powers many mission-critical products within Advertisingand User space across Desktop and Mobile

    !  Multiple production configurations based on product requirement

    !

     

    DBaaS setup for multiple products!  Yahoo Sports: Mobile friendly

    !  Flickr: Sharded across thousands of servers

    !  Hot:Hot, Hot:Warm Configurations

    !  Versions range from Percona Server 5.1 to 5.6 including PerconaXtraDB Cluster

    !  Operating systems running customized RHEL 5.6 to 6.5

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    5/57

     About Stacy

    !  Senior MySQL database administrator

    !  10+ years of experience on various flavors of relational databases.

    !  Focus on performance tuning, code reviews, database deployment

    and infrastructure management for MySQL!  In her spare time, she enjoys reading books and doing some

    volunteer work.

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    6/57

     About Yashada

    !  MySQL DevOps Engineer with a background in database designand performance tuning.

    !  4+ years of experience on various flavors of relational databases.

    In her spare time, she enjoys listening to music and going toconcerts. She appreciates sarcasm a lot too!

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    7/57

    What are the next 45 minutes about?

    !  GTID Replication!   Advantages and Disadvantages!  Performance when compared to regular replication

    !  Multi threaded slaves!  Why do we want MTS?!

     

    MTS vs single threaded replication - Performance tests

    !  Rolling out GTID and MTS to a live system with no downtime

    !  GTID and MTS in Production!

     

    Operational issues!  Monitoring and HA!  Backups using xtrabackup

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    8/57

    Why go for GTID and MTS

    !  Slave promotion becomes easier with a global transaction ID

    !  Multitenant database systems suffer from problems like resource

    contention due to bad queries, batch jobs etc. that affect replication.

    !  MTS without GTID - replication co-ordinates might no longer beaccurate due to multiple parallel worker threads.

    MTS with GTID

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    9/57

    GTID Replication

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    10/57

    File-based Replication

    Enables data from one MySQL database server (the master) to bereplicated to one or more MySQL database servers (the slaves) throughMySQL log file and its position

    !  Needs replication user, binlog is enabled!

     

    Needs a copy of master database!  Connect to master through master_host, port, replication user,

    master log file and its position.!  Each slave pulls the data from the master, and execute the events

    to the slave.

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    11/57

    GTID Replication

     A global transaction identifier (GTID) is a unique identifier created andassociated with each transaction committed on the server of origin(master).

    GTID is unique not only to the server on which it originated, but isunique across all servers in a given replication setup.

    GTID = source_id :transaction_id

    !  The source_id  identifies the originating server.!

     

    The transaction_id  is a sequence number determined by the order inwhich the transaction was committed on this server.

    Example:!  5c7401d3-3623-11e5-ae8c-78e7d15fd641:1-13476

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    12/57

    GTID Replication Advantage

    !  Replication topology is easy to change - binlog file name andposition are not required any more instead we usemaster_auto_position=1

    !  Failover is simplified

    Increase performance in relay slave - set sync_binlog=0

    !  Managing multi-tiered replication is easier

    Master_log_file=‘mysql-bin.***’Master_log_pos=****

    master_auto_position=1

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    13/57

    Replication Failover Comparison

    Regular Rep FailoverIf S1 is bad, S4S5 needto be rebuilt.

    M M

    GTID Rep FailoverRedirectS4 to M

    S2

    S3

    S1

    S4

    S5

    S2 S1

    S3 S4

    S5

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    14/57

    GTID Replication Limitations

    !  GTID does not provide replication monitoring

    !  SQL_SKIP_SLAVE_COUNTER does not work

    !  Can not force the database to start replication from specific position

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    15/57

    GTIDs Replication Caveats

    !  Updates involving non-transactional storage engines.

    !  CREATE TABLE ... SELECT statements is not supported.

    !  Temporary table is not supported inside a transaction

    To prevent GTID-based replication to fail: enforce-gtid-consistency

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    16/57

    Replication Performance GTID vs Regular Rep

    In terms of performance, GTID is almost same as regular replication. Itis slightly slower.

    The reasons could be -

    GTIDs write more lines into binary log - information about GTID!  GTID performs additional checks for transactions

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    17/57

    GTID vs Regular Replication

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    18/57

    GTID vs Regular Replication

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    19/57

    GTID vs Regular Replication

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    20/57

    Multi threaded Replication

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    21/57

    Replication Performance Issue

    ! Multi threaded applications write to the master in a parallel fashion

    ! But the replication from master to slave is single thread, it becomesbottleneck in a busy system.

    Master Slave

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    22/57

    Multi-Threaded Slaves (MTS)

    !  Type of thread – Coordinator thread and Worker thread!  Coordinator thread on slave dispatches work across several worker

    threads!  Each worker thread commit transaction individually.!  Multiple active schemas/databases can take advantage of parallel

    replication

    Master Slave

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    23/57

    MTS Prerequisites

    !  MySQL 5.6 above!  Transactions are independently based on different databases.!  Multitenant databases is the best to enable MTS!  N databases, use N parallel workers

    slave_parallel_workers = N

    !  Example: 3 databases in MySQL, better to setslave_parallel_workers =3

    Masterdb1, db2, db3

    Slavedb1, db2, db3

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    24/57

    Configure MTS

    STOP SLAVE;!  SET GLOBAL slave_parallel_workers=3;!  START SLAVE;

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    25/57

    MTS Execution Gaps and Checkpoint

    Events are no longer guaranteed to be consecutive

    !  Execution gaps are tracked

    !  Checkpoints are performed from time to time

    Check settingsslave_checkpoint_period default 300 msslave_checkpoint_group default 512 trx

    !  Exec_Master_Log_Pos shows the latest checkpoint and not latest

    transaction

    !  How to fix execution gaps -STOP SLAVE; START SLAVE UNTIL SQL_AFTER_MTS_GAPS

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    26/57

    Convert MTS to Single-threaded

    Run MTS until no more gaps are found in the relay log

    !  Stop Replication

    !  Configure single threaded slave

    !  Start single threaded slave

    START SLAVE UNTIL SQL_AFTER_MTS_GAPS;

    SET @@GLOBAL.slave_parallel_workers = 0;

    START SLAVE;

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    27/57

    MTS Advantages and Limitations

     Advantages:

    !  Take advantage of multi-core servers

    !  Changes to each schema applied and committed independently by

    worker threads!  Smaller risk of data loss

    Limitations:

    !  START SLAVE UNTIL no longer support

    !  Foreign Keys cross-referencing DBs will disable MTS

    !  No implicit transaction retry after transient failure

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    28/57

    MTS Caveats

    !  Enforcing foreign key relationships between tables in differentdatabases causes MTS to use sequential mode which can havenegative impact on performance

    Single database replication, it slows down the replicationperformance

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    29/57

    MTS without GTID

    Exec_Master_Log_Pos in SHOW SLAVE STATUS is misleading.

    !  Skipping replication errors with SQL_SLAVE_SKIP_COUNTER=1 isdangerous

    !  Backup from slave, either mysqldump and xtrabackup might not getright position

    GTID comes to the rescue

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    30/57

    Performance Testing - GTID with MTS Setup

    Test scenario:!  one master,!  two slaves (one is single-threaded replication, another slave is multi-

    threaded replication both using GTID

    Master

    Slave1 Slave2

    GTID Rep MTS GTID Rep

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    31/57

    Replication Performance Comparison

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    32/57

    Replication Performance ComparisonQPS is increased about 3 or 4 timesLoad, CPU, and Writes per second are increased as well

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    33/57

    Replication Performance Comparison

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    34/57

    GTID with MTS enabled: Things to watch out for

    Exec_Master_Log_Pos is no longer reliable!  Executed_Gtid_Set is the reliable

    !  SQL_SLAVE_SKIP_COUNTER no longer works

    !  START SLAVE UNTIL is not supported

    Slave_transaction_retries is treated as 0, and can not be changed.

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    35/57

    Rolling out GTID and MTS to production

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    36/57

    Online Rollout GTID with MTS in Percona Server

    MySQL56 requires downtime to enable GTID, it is not acceptable

    !  With Percona server 5.6, with almost no downtimeThe variable GTID_DEPLOYMENT_STEP plays an important role

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    37/57

    Database Servers Setup

    Dual masters setup ! Masters setup cross different colos.

    ! Each master carries one slave DNS

    Prod master BCP master

    Prod slave BCP slave

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    38/57

    Enable GTID without downtime

    Enable GTID in BCP side

    1. Make sure BCP master andBCP slave are sync

    2. Stop mysqld in BCP master and BCP slave,

    add gtid_deployment_step=on, 

    gtid_mode=ON ,

    enforce-gtid-consistency into my.cnf

    Restart mysqld in both servers.

    3. Replication from prod master to

    BCP master is good.

    DNS

    Prod master

    BCP master  GTID_deployme

    nt_step=on

    Prod slave

    BCP slaveGTID_deployme

    nt_step=on

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    39/57

    Enable GTID without downtime

    Promote BCP master to Prod master

    4. Prod master: set global read_only=on

    5. BCP master:

    set global gtid_deployment_step = off;

    set global read_only=off;

    6. The replication from BCP master to

    Prod master is broken.

    DNS

    Prod master

    BCP masterGTID_deployme

    nt_step=off

    Prod slave

    BCP slaveGTID_deployme

    nt_step=on

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    40/57

    Enable GTID without downtime

    Enable GTID in Prod master

    7. Enable GTID on old prod master and prod slave

    8. Fix replication from BCP master to prod master

    CHANGE MASTER TO

    MASTER_AUTO_POSITION = 1;

    START SLAVE;

    9. Enable GTID replication from

    Prod master to BCP master

    10. Enable MTS in all serversstop slave;

    set global slave_parallel_workers=16;start slave;

    DNS

    Prod masterGTID enabled

    BCP masterGTID_deployme

    nt_step=off

    Prod slaveGTID enabled

    BCP slaveGTID_deployme

    nt_step=on

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    41/57

    Enable GTID without downtime

    Switch back

    10. Perform switchover in Prod master

    Disable gtid_deployment_step across all servers.DNS

    Prod masterGTID enabled

    BCP masterGTID_deployme

    nt_step=off

    Prod slaveGTID enabled

    BCP slaveGTID_deployme

    nt_step=off

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    42/57

     

    Switchover Steps

    • 

    Enable global read_only=on in prod master

    • 

    Sanity check to make sure BCP master catch up its master(WAIT_UNTIL_SQL_THREAD_AFTER_GTIDS)

    • 

    Disable read_only in BCP master. BCP master becomes prodmaster

    Failover:

    • 

    If prod master is unreachable, it will be failover without step 1 and 2.

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    43/57

    GTID and MTS in Production : MySQL Ops

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    44/57

    GTID and MTS in production : What did we learn?

    Errant Transactions

    !  Replication Monitoring

    !

     

    Building slaves using xtrabackup

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    45/57

    Errant Transaction

    The errant transactions are:They are only executed in slaves.

    !  Could result from a mistake

    !

     

    Could be intentionally by design, such as report tables

    !  Why they cause problem

    When the slave becomes the master during failover, it exchanges its ownset of executed GTIDs, then send any missing transactions to the slaves.

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    46/57

    Errant Transaction Detection and Fix

    Detect: GTID_SUBSET(slave-Executed_Gtid_Set, master-Executed_Gtid_Set)

    If it returns true(1), no errant trx.

    If it returns false(0), it does have errant trx.

    Identify:GTID_SUBTRACT(slave-Executed_Gtid_Set, master-Executed_Gtid_Set)It returns the errant GTID.

    Fix: Inject empty transaction on all other servers.

    If the transaction must be executed in slave only, useset sql_log_bin=0;

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    47/57

    Inject Empty Transaction

    Sql_skip_slave_counter=n no longer works

    Execute a fake trx with the GTID that you want to skip

    For example: GTID=68fb0071-299b-11e5-9cd6-78e7d15dbe38:501

    STOP SLAVE;SET GTID_NEXT="68fb0071-299b-11e5-9cd6-78e7d15dbe38:501";BEGIN; COMMIT;SET GTID_NEXT="AUTOMATIC";START SLAVE;

    SHOW SLAVE STATUS\G # Verification

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    48/57

    MySQL Replication Monitoring

    • 

    Seconds_Behind_Master

     A good approximation of how late the slave is only when the slaveactively processes updates.

    If the network is slow or not much updates in the master, this is NOTa good measurement.

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    49/57

    MySQL Replication Monitoring at Yahoo

    MySQL Health Heartbeat

    1. Master generates heartbeat by updating timestamp (last_update)

    2. Slave checks the difference between current time and last_update

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    50/57

    GTID MTS Monitoring Challenger

    !  SHOW SLAVE STATUS

    !  Seconds_Behind_Master is still a good indication of thereplication lag

    !  Retrieved_Gtid_Set: List of GTIDs received by the I/O thread,cleared after a server restart

    !  Executed_Gtid_Set: List of GTIDs executed by the SQL thread

     Auto_position: 1 if GTID-based replication is enabled

    !  5.7 is using performance_schema

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    51/57

    Build Slaves Using Xtrabackup

    !  Start Xtrabackup from either master or slave

    If the backup is taken from the master,

    Please check the file xtrabackup_binlog_info in the backup folder

    If the backup is from slave,

    Please check the file xtrabackup_slave_info

    $ cat xtrabackup_slave_info

    SET GLOBAL gtid_purged='ffee1ff8-363f-11e5-af47-9cb654954cac:1-29123533';

    CHANGE MASTER TO MASTER_AUTO_POSITION=1

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    52/57

    Build Slave Using Xtrabackup

    Enable Replication in Slave

    Issuemysql> SET GLOBAL gtid_purged='ffee1ff8-363f-11e5-af47-9cb654954cac:1-29123533';

    ERROR 1840 (HY000): @@GLOBAL.GTID_PURGED can only be set when@@GLOBAL.GTID_EXECUTED is empty.

    How to fix

    !  RESET MASTER;

    !  SET GLOBAL gtid_purged='ffee1ff8-363f-11e5-af47-9cb654954cac:1-29123533’;

    !  CHANGE MASTER TO MASTER_HOST="mastername", master_user='rep_user',master_password='rep_password', MASTER_AUTO_POSITION = 1;

    !  START SLAVE;

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    53/57

    Build Slave Using Xtrabackup

    Still issue?

    mysql> start slave;ERROR 1872 (HY000): Slave failed to initialize relay log info structure from the repository

    RESET SLAVE;

    START SLAVE;

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    54/57

    Summary

    !  GTID

    !  MTS

    !  GTID with MTS performance comparison

    GTID with MTS online rollout!  Things to watch out

    !  Rebuild slave

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    55/57

    We would love to talk more ..

    http://mysqlatyahoo.tumblr.com 

     YJ

    [email protected]

    https://www.linkedin.com/pub/yashada-jadhav/18/659/a6 

    Stacy Yuan

    [email protected]

    https://www.linkedin.com/pub/stacy-yuan/53/577/324 

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    56/57

     Appendix

  • 8/17/2019 CON5409_Jadhav-Yahoo Case Study- MySQL GTIDs and Parallel or Multithreaded Replication

    57/57

    Appendix: 1. GTID with MTS Configuration and Setup

    ! Hardware: 24 CPUs, 48 GB memory

    ! Database: innodb_buffer_pool_size = 32 GB

    innodb_log_file_size = 1G

    innodb_thread_concurrency = 0Case1:

    ! Master: Create 8 schemas and 8 users, each user access oneschema

    Create 8 tables and 1 million rows in each table for each schema

    Eight sysbench run executed concurrently on the master.

    num-threads=2

    max-time=900

    oltp-read-only=off