group replication: a journey to the group communication core · replication plugin api mysql server...

38
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. Group Replication: A Journey to the Group Communication Core Alfranio Correia ([email protected]) Principal Software Engineer 4th of February Oracle / Fosdem 2017 1

Upload: others

Post on 28-Sep-2020

12 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.

GroupReplication:AJourneytotheGroupCommunicationCore

Alfranio Correia([email protected])PrincipalSoftwareEngineer

4thofFebruary Oracle/Fosdem2017 1

Page 2: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

SafeHarborStatementThefollowingisintendedtooutlineourgeneralproductdirection.Itisintendedforinformationpurposesonly,andmaynotbeincorporatedintoanycontract.Itisnotacommitmenttodeliveranymaterial,code,orfunctionality,andshouldnotberelieduponinmakingpurchasingdecisions.Thedevelopment,release,andtimingofanyfeaturesorfunctionalitydescribedforOracle’sproductsremainsatthesolediscretionofOracle.

24thofFebruary Oracle/Fosdem2017

Page 3: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017,Oracleand/oritsaffiliates.Allrightsreserved.|

ProgramAgenda

4thofFebruary Oracle/Fosdem2017

Page 4: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Background

GroupCommunicationInterface

Group Communication Engine

Performance

Conclusion

ProgramAgenda

4thofFebruary Oracle/Fosdem2017 4

1

2

3

4

5

Page 5: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017,Oracleand/oritsaffiliates.Allrightsreserved.|

Background

4thofFebruary Oracle/Fosdem2017

1

Page 6: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

MySQLInnoDB Cluster

64thofFebruary Oracle/Fosdem2017

S1 S2 S3 S4 S…

M

M M

MySQLConnectorApplication

MySQLRouter

MySQLConnectorApplication

MySQLRouter

MySQLShell

HA

ReplicaSet

1

S1 S2 S3 S4 S…

M

M M

MySQLConnectorApplication

MySQLRouter

HA

ReplicaSet2

ReplicaSet3

MySQLConnectorApplication

MySQLRouter

S1 S2 S3 S4

M

M M

HA

Page 7: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

MySQLGroupReplication• WhatisMySQLGroupReplication?“Multi-masterupdateeverywhere replicationpluginforMySQLwithbuilt-inautomaticdistributedrecovery,conflictdetection andgroupmembership.”

• WhatdoestheMySQLGroupReplicationplugindofortheuser?– Automates serverfailover inSinglePrimary– Providesfault tolerance– Enablesupdateeverywhere setups– Automatesgroupreconfiguration(handlingofcrashes,failures,re-connects)– Providesahighlyavailablereplicated database

74thofFebruary Oracle/Fosdem2017

Page 8: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

MajorBuilding Blocks

84thofFebruary Oracle/Fosdem2017

M M M M MCom.API

ReplicationPlugin

API

MySQLServer

Group Comm.System (Corosync)GroupCom.Engine

Page 9: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

TheCompleteStack

94thofFebruary Oracle/Fosdem2017

API

ReplicationPlugin

API

MySQLServer

PerformanceSchemaTables:Monitoring

MySQL

APIs:Lifecycle/Capture/Applier

InnoDBReplicationProtocol

GroupCom.API

GroupCom.Engine

Network

PluginCapture ApplierConflicts

Handler

GroupComm.System(Corosync)GroupCom.Engine

GroupCom.Binding

Recovery

Page 10: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017,Oracleand/oritsaffiliates.Allrightsreserved.|

Group Communication Interface

4thofFebruary Oracle/Fosdem2017

2

Page 11: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Design• Abstract interfacetosupport different solutions– Reconfigurethe group and get membership information– Send and receive messages

• Usesthe observer pattern–MySQLGroupReplication listens toevents

• Different implementations perCommunication Systems• Made the transition from Corosync easy

114thofFebruary Oracle/Fosdem2017

Page 12: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Semantics• Closed Group–Only group members cansend and receive messages

• TotalOrder–Messages aretotally ordered among each other

• SafeDelivery–One cannot deliver amessage if the majority can’t doso

• View Synchrony– Changes tomembership aretolltaly ordered with messages

124thofFebruary Oracle/Fosdem2017

Page 13: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017,Oracleand/oritsaffiliates.Allrightsreserved.|

Group Communication Engine

4thofFebruary Oracle/Fosdem2017

3

Page 14: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Built-inCommunicationEngine• Based on provendistributedsystemsalgorithms(Paxos)– Compression,multi-platform,dynamicmembership,SSL,IPwhitelisting

• Nothird-partysoftwarerequired• Nonetworkmulticastsupport required–MySQLGroupReplicationcanoperateoncloudbasedinstallationswheremulticastisunsupported

144thofFebruary Oracle/Fosdem2017

Page 15: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

PaxosFamily and Friends

154thofFebruary Oracle/Fosdem2017

Multi-Paxos

Fast Paxos

Disk Paxos

Cheap Paxos

VerticalPaxos

Generalized Paxos

Raft

Mencius

Flexible Paxos

Egalitarian Paxos

Byzantine Paxos

Page 16: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

BasicPaxos

164thofFebruary Oracle/Fosdem2017

M0

M1

M2

Prepare/Election Phase

M0

M1

M2

Accept Phase

M0

M1

M2

Learn Phase

• Get agreement on avalue:– Next message/transaction tobedelivered

• Members may have different roles:– Usually all members areproposers,acceptors and learners

• Need aquorum tomake progress– Usually amajority

1 2

3

Page 17: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

PreparePhase

174thofFebruary Oracle/Fosdem2017

• Proposer sends apreparerequest with number “n”tomembers (i.e.acceptors)• If an acceptor has not received arequest with anumber greater than “n”,it will respond• It will promise not toaccept arequest numberedless than “n”• If the reply has anon-empty value,the leaderwillusethat with the highest number

M0

M1

M2

Prepare1.1

M0

M1

M2

Promise1.2

(n)

(n)

(y,value)

(x,value)

Page 18: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Accept Phase

184thofFebruary Oracle/Fosdem2017

• If the leaderfinds outthat anon-empty value hasbeen previously proposed,it will useit• Otherwise,it will propose anew value• Requires anetworkround-triptoget agreement

M0

M1

M2

Accept2.1

M0

M1

M2

Accepted2.2

(n,value)

(n,value)

(ack)

(ack)

Page 19: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Learn Phase

194thofFebruary Oracle/Fosdem2017

• It will inform other members about the decision• Only one learner is required tohave progress• If the member already has the value,an ack isenough

M0

M1

M2

Learn3

(value)

(value)

Page 20: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Multi-Paxos

204thofFebruary Oracle/Fosdem2017

slot 0

0

1

2

Accept/Learn0

1

2

Accept/Learn0

1

2

Accept/Learn0

1

2

Election0

1

2

Accept/Learn0

1

2

Election

slot 1 slot 2 slot 3 ...

• Consensus roundtodecideon each slot’s content

• Replicated LogStream

Page 21: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

So what?• They caneasily become abottleneck• Multiple leaders:eXtended COMmunications

214thofFebruary Oracle/Fosdem2017

Page 22: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

How doesXCOMwork?

224thofFebruary Oracle/Fosdem2017

slot 0

0

1

2

Accept/Learn

slot 1 slot 2 slot 3

0

1

2

Accept/Learn

slot 4 slot 5 ......

0

1

2

Accept/Learn0

1

2

Accept/Learn0

1

2

Accept/Learn0

1

2

Accept/Learn

• Every member is aleaderso noleaderelection

• Every member owns aIn-Memory Replicated Log

Page 23: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Nothing toPropose

234thofFebruary Oracle/Fosdem2017

slot 0

0

1

2

Accept/Learn

nop slot 2 slot 3

0

1

2

Accept/Learn

nop slot 5 ......

0

1

2

Accept/Learn0

1

2

Learn0

1

2

Accept/Learn0

1

2

Learn

• Only alearn message with a“nop”is enough

Page 24: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

How is the optimization possible?• Member “1”sends alearn message “(0,nop)”tomember “4”and dies• Non-leaderscanonly propose “nop”(s)on behalf of others• They mustgo through all Paxosphases

244thofFebruary Oracle/Fosdem2017

0

2

3

1

4

Learn

1

2

3

0

4

(1)

(1)

1

2

3

0

4

(0,-)

(0,-)

1

2

3

0

4

(1,nop)

(1,nop)

1

2

3

0

4

(ack)

(ack)

Prepare Promise Accept Accepted

1

2

3

0

4

(nop)

(nop)

Learn

(0,nop)

Page 25: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

HandlingFailures/Suspicions

254thofFebruary Oracle/Fosdem2017

slot 0

0

1

2

Accept/Learn0

1

2

Accept/Learn0

1

2

Prep./Accept/Learn

slot 1 slot 2 nop

0

1

2

Accept/Learn0

1

2

Accept/Learn

slot 4

0

1

2

Accept/Learn

slot 5 ......

Page 26: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Implemented Optimizations inXCOM• Pipeline– Proposes several “transactions”inparallel– Improvesperformanceinhigh latency networks– Current value is “10”

• Batch– ImprovesCPUusage– Improvesperformanceinhigh latency/low bandwidth networks– Current value is “5”

264thofFebruary Oracle/Fosdem2017

Page 27: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Implemented Optimizations inBiding• Compression– Reduces bandwith consumption

• Automatically reconfigureagroup– Faulty members areexpelled

274thofFebruary Oracle/Fosdem2017

Page 28: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017,Oracleand/oritsaffiliates.Allrightsreserved.|

Performance

4thofFebruary Oracle/Fosdem2017

6

Page 29: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Configuration• Multipe writers – One perServer• Singlewriter – Just one client• OracleServerX5-2Lwith two IntelXeon E5-2660-V3processors– 20Cores– 40HardwareThreads

• OracleEnterprise Linux7,kernel 3.8.13-118.13.3• 10Gbps ethernet• Used “tc”tothrottle network

294thofFebruary Oracle/Fosdem2017

Page 30: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Multiple writers (256Bytes)

304thofFebruary Oracle/Fosdem2017

3members 5members 7members 3members 5members 7members

Uncompressed256bytepayload Compressed256bytepayload

0

20000

40000

60000

80000

100000

120000

140000

16000010Gbpsnetworkwith0.1mslatency

200Mbpsnetworkwith7mslatency

• Compression improvesperformanceinMetropolitan

• Headers arenot compressed (~200bytes)though

Messagesp

erse

cond

sent

Page 31: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Multiple writers (1KBytes)

314thofFebruary Oracle/Fosdem2017

• Check whether compression may help or not

• Usually helps when bandwidth is aproblem

3members 5members 7members 3members 5members 7members

Uncompressed1Kpayload Compressed1Kpayload

0

20000

40000

60000

80000

100000

12000010Gbpsnetworkwith0.1mslatency

200Mbpsnetworkwith7mslatency

Messagesp

erse

cond

sent

Page 32: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

SingleWriter (1KBytes)

324thofFebruary Oracle/Fosdem2017

3members 5members 7members 3members 5members 7members

Uncompressed1Kpayload Compressed1Kpayload

0

20000

40000

60000

80000

100000

12000010Gbpsnetworkwith0.1mslatency

200Mbpsnetworkwith7mslatency

• The scale outeffect with multiple writers is small

• Compression doesnot help here

Messagesp

erse

cond

sent

Page 33: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017,Oracleand/oritsaffiliates.Allrightsreserved.|

Conclusion

4thofFebruary Oracle/Fosdem2017

5

Page 34: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Current Status• Has made into MySQL 5.7.17release• GAinDecember 2016

344thofFebruary Oracle/Fosdem2017

Page 35: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Future• Configurable Paxosrole(s)– Leader/Acceptor/Learner or Acceptor/Learner or Learner

• Multiple leadersonly if needed:– Avoids the skip message– ImprovesCPUand networkusage

• Not all members need tomake messages networkdurable– Reduces resilience but improvesperformance

354thofFebruary Oracle/Fosdem2017

Page 36: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Future• Expose someconfiguration options:– Batch– Pipeline

• Compression at low level layers aswell• Write tonetworkinparallel• Overlay networks

364thofFebruary Oracle/Fosdem2017

Page 37: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier

Copyright©2017, Oracleand/oritsaffiliates.Allrightsreserved.|

Wheretogofromhere?• Packages– http://www.mysql.com/downloads/

• Documentation– http://dev.mysql.com/doc/refman/5.7/en/group-replication.html

• BlogsfromtheEngineers(news,technicalinformation,andmuchmore)– http://mysqlhighavailability.com

374thofFebruary Oracle/Fosdem2017

Page 38: Group Replication: A Journey to the Group Communication Core · Replication Plugin API MySQL Server Performance Schema Tables: Monitoring MySQL APIs: Lifecycle / Capture / Applier