group replication: a journey to the group communication core · replication plugin api mysql server...

Post on 28-Sep-2020

12 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.

GroupReplication:AJourneytotheGroupCommunicationCore

Alfranio Correia(alfranio.correia@oracle.com)PrincipalSoftwareEngineer

4thofFebruary Oracle/Fosdem2017 1

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

SafeHarborStatementThefollowingisintendedtooutlineourgeneralproductdirection.Itisintendedforinformationpurposesonly,andmaynotbeincorporatedintoanycontract.Itisnotacommitmenttodeliveranymaterial,code,orfunctionality,andshouldnotberelieduponinmakingpurchasingdecisions.Thedevelopment,release,andtimingofanyfeaturesorfunctionalitydescribedforOracle’sproductsremainsatthesolediscretionofOracle.

24thofFebruary Oracle/Fosdem2017

Copyright©2017,Oracleand/oritsaffiliates.Allrightsreserved.|

ProgramAgenda

4thofFebruary Oracle/Fosdem2017

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Background

GroupCommunicationInterface

Group Communication Engine

Performance

Conclusion

ProgramAgenda

4thofFebruary Oracle/Fosdem2017 4

1

2

3

4

5

Copyright©2017,Oracleand/oritsaffiliates.Allrightsreserved.|

Background

4thofFebruary Oracle/Fosdem2017

1

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

MySQLInnoDB Cluster

64thofFebruary Oracle/Fosdem2017

S1 S2 S3 S4 S…

M

M M

MySQLConnectorApplication

MySQLRouter

MySQLConnectorApplication

MySQLRouter

MySQLShell

HA

ReplicaSet

1

S1 S2 S3 S4 S…

M

M M

MySQLConnectorApplication

MySQLRouter

HA

ReplicaSet2

ReplicaSet3

MySQLConnectorApplication

MySQLRouter

S1 S2 S3 S4

M

M M

HA

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

MySQLGroupReplication• WhatisMySQLGroupReplication?“Multi-masterupdateeverywhere replicationpluginforMySQLwithbuilt-inautomaticdistributedrecovery,conflictdetection andgroupmembership.”

• WhatdoestheMySQLGroupReplicationplugindofortheuser?– Automates serverfailover inSinglePrimary– Providesfault tolerance– Enablesupdateeverywhere setups– Automatesgroupreconfiguration(handlingofcrashes,failures,re-connects)– Providesahighlyavailablereplicated database

74thofFebruary Oracle/Fosdem2017

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

MajorBuilding Blocks

84thofFebruary Oracle/Fosdem2017

M M M M MCom.API

ReplicationPlugin

API

MySQLServer

Group Comm.System (Corosync)GroupCom.Engine

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

TheCompleteStack

94thofFebruary Oracle/Fosdem2017

API

ReplicationPlugin

API

MySQLServer

PerformanceSchemaTables:Monitoring

MySQL

APIs:Lifecycle/Capture/Applier

InnoDBReplicationProtocol

GroupCom.API

GroupCom.Engine

Network

PluginCapture ApplierConflicts

Handler

GroupComm.System(Corosync)GroupCom.Engine

GroupCom.Binding

Recovery

Copyright©2017,Oracleand/oritsaffiliates.Allrightsreserved.|

Group Communication Interface

4thofFebruary Oracle/Fosdem2017

2

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Design• Abstract interfacetosupport different solutions– Reconfigurethe group and get membership information– Send and receive messages

• Usesthe observer pattern–MySQLGroupReplication listens toevents

• Different implementations perCommunication Systems• Made the transition from Corosync easy

114thofFebruary Oracle/Fosdem2017

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Semantics• Closed Group–Only group members cansend and receive messages

• TotalOrder–Messages aretotally ordered among each other

• SafeDelivery–One cannot deliver amessage if the majority can’t doso

• View Synchrony– Changes tomembership aretolltaly ordered with messages

124thofFebruary Oracle/Fosdem2017

Copyright©2017,Oracleand/oritsaffiliates.Allrightsreserved.|

Group Communication Engine

4thofFebruary Oracle/Fosdem2017

3

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Built-inCommunicationEngine• Based on provendistributedsystemsalgorithms(Paxos)– Compression,multi-platform,dynamicmembership,SSL,IPwhitelisting

• Nothird-partysoftwarerequired• Nonetworkmulticastsupport required–MySQLGroupReplicationcanoperateoncloudbasedinstallationswheremulticastisunsupported

144thofFebruary Oracle/Fosdem2017

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

PaxosFamily and Friends

154thofFebruary Oracle/Fosdem2017

Multi-Paxos

Fast Paxos

Disk Paxos

Cheap Paxos

VerticalPaxos

Generalized Paxos

Raft

Mencius

Flexible Paxos

Egalitarian Paxos

Byzantine Paxos

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

BasicPaxos

164thofFebruary Oracle/Fosdem2017

M0

M1

M2

Prepare/Election Phase

M0

M1

M2

Accept Phase

M0

M1

M2

Learn Phase

• Get agreement on avalue:– Next message/transaction tobedelivered

• Members may have different roles:– Usually all members areproposers,acceptors and learners

• Need aquorum tomake progress– Usually amajority

1 2

3

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

PreparePhase

174thofFebruary Oracle/Fosdem2017

• Proposer sends apreparerequest with number “n”tomembers (i.e.acceptors)• If an acceptor has not received arequest with anumber greater than “n”,it will respond• It will promise not toaccept arequest numberedless than “n”• If the reply has anon-empty value,the leaderwillusethat with the highest number

M0

M1

M2

Prepare1.1

M0

M1

M2

Promise1.2

(n)

(n)

(y,value)

(x,value)

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Accept Phase

184thofFebruary Oracle/Fosdem2017

• If the leaderfinds outthat anon-empty value hasbeen previously proposed,it will useit• Otherwise,it will propose anew value• Requires anetworkround-triptoget agreement

M0

M1

M2

Accept2.1

M0

M1

M2

Accepted2.2

(n,value)

(n,value)

(ack)

(ack)

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Learn Phase

194thofFebruary Oracle/Fosdem2017

• It will inform other members about the decision• Only one learner is required tohave progress• If the member already has the value,an ack isenough

M0

M1

M2

Learn3

(value)

(value)

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Multi-Paxos

204thofFebruary Oracle/Fosdem2017

slot 0

0

1

2

Accept/Learn0

1

2

Accept/Learn0

1

2

Accept/Learn0

1

2

Election0

1

2

Accept/Learn0

1

2

Election

slot 1 slot 2 slot 3 ...

• Consensus roundtodecideon each slot’s content

• Replicated LogStream

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

So what?• They caneasily become abottleneck• Multiple leaders:eXtended COMmunications

214thofFebruary Oracle/Fosdem2017

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

How doesXCOMwork?

224thofFebruary Oracle/Fosdem2017

slot 0

0

1

2

Accept/Learn

slot 1 slot 2 slot 3

0

1

2

Accept/Learn

slot 4 slot 5 ......

0

1

2

Accept/Learn0

1

2

Accept/Learn0

1

2

Accept/Learn0

1

2

Accept/Learn

• Every member is aleaderso noleaderelection

• Every member owns aIn-Memory Replicated Log

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Nothing toPropose

234thofFebruary Oracle/Fosdem2017

slot 0

0

1

2

Accept/Learn

nop slot 2 slot 3

0

1

2

Accept/Learn

nop slot 5 ......

0

1

2

Accept/Learn0

1

2

Learn0

1

2

Accept/Learn0

1

2

Learn

• Only alearn message with a“nop”is enough

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

How is the optimization possible?• Member “1”sends alearn message “(0,nop)”tomember “4”and dies• Non-leaderscanonly propose “nop”(s)on behalf of others• They mustgo through all Paxosphases

244thofFebruary Oracle/Fosdem2017

0

2

3

1

4

Learn

1

2

3

0

4

(1)

(1)

1

2

3

0

4

(0,-)

(0,-)

1

2

3

0

4

(1,nop)

(1,nop)

1

2

3

0

4

(ack)

(ack)

Prepare Promise Accept Accepted

1

2

3

0

4

(nop)

(nop)

Learn

(0,nop)

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

HandlingFailures/Suspicions

254thofFebruary Oracle/Fosdem2017

slot 0

0

1

2

Accept/Learn0

1

2

Accept/Learn0

1

2

Prep./Accept/Learn

slot 1 slot 2 nop

0

1

2

Accept/Learn0

1

2

Accept/Learn

slot 4

0

1

2

Accept/Learn

slot 5 ......

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Implemented Optimizations inXCOM• Pipeline– Proposes several “transactions”inparallel– Improvesperformanceinhigh latency networks– Current value is “10”

• Batch– ImprovesCPUusage– Improvesperformanceinhigh latency/low bandwidth networks– Current value is “5”

264thofFebruary Oracle/Fosdem2017

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Implemented Optimizations inBiding• Compression– Reduces bandwith consumption

• Automatically reconfigureagroup– Faulty members areexpelled

274thofFebruary Oracle/Fosdem2017

Copyright©2017,Oracleand/oritsaffiliates.Allrightsreserved.|

Performance

4thofFebruary Oracle/Fosdem2017

6

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Configuration• Multipe writers – One perServer• Singlewriter – Just one client• OracleServerX5-2Lwith two IntelXeon E5-2660-V3processors– 20Cores– 40HardwareThreads

• OracleEnterprise Linux7,kernel 3.8.13-118.13.3• 10Gbps ethernet• Used “tc”tothrottle network

294thofFebruary Oracle/Fosdem2017

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Multiple writers (256Bytes)

304thofFebruary Oracle/Fosdem2017

3members 5members 7members 3members 5members 7members

Uncompressed256bytepayload Compressed256bytepayload

0

20000

40000

60000

80000

100000

120000

140000

16000010Gbpsnetworkwith0.1mslatency

200Mbpsnetworkwith7mslatency

• Compression improvesperformanceinMetropolitan

• Headers arenot compressed (~200bytes)though

Messagesp

erse

cond

sent

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Multiple writers (1KBytes)

314thofFebruary Oracle/Fosdem2017

• Check whether compression may help or not

• Usually helps when bandwidth is aproblem

3members 5members 7members 3members 5members 7members

Uncompressed1Kpayload Compressed1Kpayload

0

20000

40000

60000

80000

100000

12000010Gbpsnetworkwith0.1mslatency

200Mbpsnetworkwith7mslatency

Messagesp

erse

cond

sent

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

SingleWriter (1KBytes)

324thofFebruary Oracle/Fosdem2017

3members 5members 7members 3members 5members 7members

Uncompressed1Kpayload Compressed1Kpayload

0

20000

40000

60000

80000

100000

12000010Gbpsnetworkwith0.1mslatency

200Mbpsnetworkwith7mslatency

• The scale outeffect with multiple writers is small

• Compression doesnot help here

Messagesp

erse

cond

sent

Copyright©2017,Oracleand/oritsaffiliates.Allrightsreserved.|

Conclusion

4thofFebruary Oracle/Fosdem2017

5

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Current Status• Has made into MySQL 5.7.17release• GAinDecember 2016

344thofFebruary Oracle/Fosdem2017

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Future• Configurable Paxosrole(s)– Leader/Acceptor/Learner or Acceptor/Learner or Learner

• Multiple leadersonly if needed:– Avoids the skip message– ImprovesCPUand networkusage

• Not all members need tomake messages networkdurable– Reduces resilience but improvesperformance

354thofFebruary Oracle/Fosdem2017

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Future• Expose someconfiguration options:– Batch– Pipeline

• Compression at low level layers aswell• Write tonetworkinparallel• Overlay networks

364thofFebruary Oracle/Fosdem2017

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Wheretogofromhere?• Packages– http://www.mysql.com/downloads/

• Documentation– http://dev.mysql.com/doc/refman/5.7/en/group-replication.html

• BlogsfromtheEngineers(news,technicalinformation,andmuchmore)– http://mysqlhighavailability.com

374thofFebruary Oracle/Fosdem2017

top related