beyond tcp: the evolution of internet transport protocols

Post on 08-Apr-2017

477 Views

Category:

Internet

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

BeyondTCP:Theevolu0onofInternettransportprotocols

OlivierBonaventureUCL

h2p://inl.info.ucl.ac.be

Paris,Polytechnique,Jan,2016

Agenda

•  Internettransportprotocols– TCP– SCTP

•  MulKpathTCP– Basicprinciples– Usecases

•  What'snext?– QUIC

TheoriginsofTCP

Source:h2p://spectrum.ieee.org/compuKng/soRware/the-strange-birth-and-long-life-of-unix

TheUnixpipemodel

echo wc1234 abbsbbbs

TheTCPbytestreammodel

Client ServerABCDEF...111232

0988989 ... XYZZ

IP:1.2.3.4 IP:4.5.6.7

TCP

Morethan30yearsold!

CongesKoncollapse

JACOBSON,V.CongesKonavoidanceandcontrol.InProceedingsofSIGCOMM’88(Stanford,CA,Aug.1988),ACM.

Performanceissues

•  TCPconsideredtobetoocomplexbymany– SoRwareimplementaKoncannotcopewithincreasingnetworkbandwidth

•  Forhighperformance,transportshouldbeimplementedinhardware– Transputers– Simplertransportprotocols

MorelimitaKonsofTCP

•  IssueswiththeTCPpipemodel– Onlysupportsasinglebytestream

•  SomeapplicaKonsneedseveralstreamswithprioriKes

– Nosupportformessages– ConnecKonsarea2achedtooneIPaddressonclientandoneIPaddressonserver

•  NofailoverevenifhostshavemulKpleinterfaces•  Nosupportformobility•  NoloadbalancingformulKhomedhosts

SCTP:AnalternaKvetoTCP

SCTPintwoslides

•  Moderntransportprotocol–  CleanerconnecKonestablishment

•  Four-wayhandshaketocounterSYNfloodinga2acks–  Cleanerprotocol

•  FlexibleTLVpacketformatthatiseasytoextend•  SelecKveacknowledgementsfromthestart

–  RichersemanKcs•  Messages,mulKplestreams,unreliabledelivery•  AdvancedAPItoreplacesocketAPI

–  Failoversupport•  ConnecKoncanmovefromoneIPaddresstoanotherone

SCTPconnecKonestablishment

INIT,Itag=1234

INIT-ACK,cookie,ITag=5678

COOKIE-Echo,Vtag=5678,cookie

COOKIE-ACK,Vtag=1234

Encryptstateincookie,Doesnotstoreit

Decryptscookie,Recoverinfotocreatestate

WhatwentwrongwithSCTP?

•  Replacingatransportprotocol

PhysicalDatalinkNetwork

TCPApplication

SCTP

ApplicaKonsmustberewri<enwithnewAPI

IPprotocol=132ForSCTPpackets

DeployingSCTP

•  ApplicaKonsdeveloperswillinvestinSCTPassoonasSCTPisimplementedon– Clients– Servers

TheInternetarchitecturethatweexplaintoourstudents

PhysicalDatalinkNetwork

TransportApplication

O.Bonaventure,Computernetworking:Principles,ProtocolsandPracKce,openebook,h2p://inl.info.ucl.ac.be/cnp3

Physical

PhysicalDatalink

PhysicalDatalinkNetwork

SCTPdeployment

PhysicalDatalinkNetwork

TransportApplication

PhysicalDatalinkNetwork

TransportApplication

PhysicalDatalinkNetwork

PhysicalDatalink

TCPSCTPSCTP SCTP

Inreality

– almostasmanymiddleboxesasrouters– varioustypesofmiddleboxesaredeployed

Sherry,JusKne,etal."Makingmiddleboxessomeoneelse'sproblem:Networkprocessingasacloudservice."ProceedingsoftheACMSIGCOMM2012conference.ACM,2012.

InternetdevicesaccordingtoCisco

h2p://www.cisco.com/web/about/ac50/ac47/2.html

WebSecurityAppliance

NAC Appliance

ACEXMLGateway

Streamer

VPNConcentrator

SSLTerminator

CiscoIOSFirewall

IPTelephonyRouter

PIXFirewallRightandLeR

VoiceGatewayVVVV

ContentEngine

NAT

Middleboxesinthearchitecture

•  Intheofficialarchitecture,theydonotexist•  Inreality...

PhysicalDatalinkNetwork

TransportApplication

PhysicalDatalinkNetwork

TransportApplication

PhysicalDatalinkNetwork

TCP

PhysicalDatalinkNetwork

TransportApplication

TCPsegmentsprocessedbyarouter

Source port Destination port

Checksum Urgent pointer

THL Reserved Flags

Acknowledgment number

Sequence number

Window

Ver IHL ToS Total length

Checksum TTL ProtocolFlags Frag. Offset

Source IP address

Identification

Destination IP address

Payload

Options

Source port Destination port

Checksum Urgent pointer

THL Reserved Flags

Acknowledgment number

Sequence number

Window

Ver IHL ToS Total length

Checksum TTL ProtocolFlags Frag. Offset

Source IP address

Identification

Destination IP address

Payload

Options

IP

TCP

TCPsegmentsprocessedbyaNAT

Source port Destination port

Checksum Urgent pointer

THL Reserved Flags

Acknowledgment number

Sequence number

Window

Ver IHL ToS Total length

Checksum TTL ProtocolFlags Frag. Offset

Source IP address

Identification

Destination IP address

Payload

Options

Source port Destination port

Checksum Urgent pointer

THL Reserved Flags

Acknowledgment number

Sequence number

Window

Ver IHL ToS Total length

Checksum TTL ProtocolFlags Frag. Offset

Source IP address

Identification

Destination IP address

Payload

Options

TCPsegmentsprocessedbyaNAT(2)

•  acKvemodeRpbehindaNAT

220ProFTPD1.3.3dServer(BELNETFTPDServer)[193.190.67.15]Rp_login:user`<null>'pass`<null>'host`Rp.belnet.be'Name(Rp.belnet.be:obo):anonymous--->USERanonymous331Anonymousloginok,sendyourcompleteemailaddressasyourpasswordPassword:--->PASSXXXX--->PORT192,168,0,7,195,120200PORTcommandsuccessful--->LIST150OpeningASCIImodedataconnecKonforfilelistlrw-r--r--1RpRp6Jun12011pub->mirror226Transfercomplete

TCPsegmentsprocessedbyanALGrunningonaNAT

Source port Destination port

Checksum Urgent pointer

THL Reserved Flags

Acknowledgment number

Sequence number

Window

Ver IHL ToS Total length

Checksum TTL ProtocolFlags Frag. Offset

Source IP address

Identification

Destination IP address

Payload

Options

Source port Destination port

Checksum Urgent pointer

THL Reserved Flags

Acknowledgment number

Sequence number

Window

Ver IHL ToS Total length

Checksum TTL ProtocolFlags Frag. Offset

Source IP address

Identification

Destination IP address

Payload

Options

© O. Bonaventure, 2011

HowtransparentistheInternet?•  25thSeptember2010to30thApril2011

•  142accessnetworks•  24countries•  SentspecificTCPsegmentsfromclienttoaserverinJapan

Honda,Michio,etal."Isits=llpossibletoextendTCP?"Proceedingsofthe2011ACMSIGCOMMconferenceonInternetmeasurementconference.ACM,2011.

End-to-endtransparencytoday

Source port Destination port

Checksum Urgent pointer

THL Reserved Flags

Acknowledgment number

Sequence number

Window

Ver IHL ToS Total length

Checksum TTL ProtocolFlags Frag. Offset

Source IP address

Identification

Destination IP address

Payload

Options

Source port Destination port

Checksum Urgent pointer

THL Reserved Flags

Acknowledgment number

Sequence number

Window

Ver IHL ToS Total length

Checksum TTL ProtocolFlags Frag. Offset

Source IP address

Identification

Destination IP address

Payload

Options

Middleboxesdon'tchangetheProtocolfield,but

somediscardpacketswithaProtocolfielddifferentthan

TCPorUDP

Agenda

•  Internettransportprotocols– TCP– SCTP

•  MulKpathTCP– Basicprinciples– Usecases

•  What'snext?– QUIC

TCPConnecKonestablishment•  Three-wayhandshake

SYN,seq=1234,OpKons

SYN+ACK,ack=1235,seq=5678,OpKons

ACK,seq=1235,ack=5679

Datatransfer

seq=1234,"abcd"

ACK,ack=1238,win=4

seq=1238,"efgh"

ACK,ack=1242,win=0

ConnecKonrelease

seq=1234,"abcd"

RST

ConnecKonrelease

seq=1234,"abcd"

ACK,ack=1239

FIN,ack=350

seq=345,"ijkl"

FIN,seq=1238

FIN,seq=349

MulKpathTCP

•  HowcanweefficientlyusethemulKpleinterfacesthatareavailableontoday'shosts?

DesignobjecKves

•  MulKpathTCPisanevolu=onofTCP

•  DesignobjecKves– SupportunmodifiedapplicaKons– Workovertoday’snetworks(IPv4andIPv6)– WorksinallnetworkswhereregularTCPworks

TheMul=pathTCPbytestreammodel

33

Client ServerABCDEF...111232

0988989 ... XYZZ

IP:1.2.3.4 IP:4.5.6.7

IP:2.3.4.5 IP:6.7.8.9

BCD A

TheMulKpathTCPprotocol

•  Controlplane– HowtomanageaMulKpathTCPconnecKonthatusesseveralpaths?

•  Dataplane– Howtotransportdata?

•  CongesKoncontrol– HowtocontrolcongesKonovermulKplepaths?

AnaïveMulKpathTCP

SYN+ACK+OpKonACK

seq=123,"abc"

seq=126,"def"

SYN+OpKon

AnaïveMulKpathTCPIntoday'sInternet?

SYN+OpKon

SYN+ACK+OpKonACK

seq=123,"abc"

seq=126,"def"

ThereisnocorrespondingTCPconnecKon

Designdecision

– AMul=pathTCPconnec=oniscomposedofoneormoreregularTCPsubflowsthatarecombined

•  EachhostmaintainsstatethatgluestheTCPsubflowsthatcomposeaMulKpathTCPconnecKontogether

•  EachTCPsubflowissentoverasinglepathandappearslikearegularTCPconnecKonalongthispath

MulKpathTCPandthearchitecture

PhysicalDatalinkNetwork

TransportApplication MulKpathTCP

TCP1

socket

TCP2 TCPn...

Application

A.Ford,C.Raiciu,M.Handley,S.Barre,andJ.Iyengar,“ArchitecturalguidelinesformulKpathTCPdevelopment",RFC61822011.

NomodificaKontoeasedeployment

MulKplesubflowstocopewithmiddleboxes

AregularTCPconnecKon

•  WhatisaregularTCPconnecKon?

–  Itstartswithathree-wayhandshake•  SYNsegmentsmaycontainspecialopKons

– Alldatasegmentsaresentinsequence•  Thereisnogapinthesequencenumbers

–  ItisterminatedbyusingFINorRST

MulKpathTCPSYN+OpKon

SYN+ACK+OpKonACK

SYN+OtherOpKon

SYN+ACK+OtherOpKonACK

HowtocombinetwoTCPsubflows?

SYN+OpKon

SYN+ACK+OpKonACK

SYN+OtherOpKonSYN+ACK+OtherOpKon

ACK

Howtolinkwithbluesubflow?

TCP101IdenKficaKonofaTCPconnecKon

Fourtuple–  IPsource–  IPdest– Portsource– PortdestAllTCPsegmentscontainthefourtuple

Source port Destination port

Checksum Urgent pointer

THL Reserved Flags

Acknowledgment number

Sequence number

Window

Ver IHL ToS Total length

Checksum TTL ProtocolFlags Frag. Offset

Source IP address

Identification

Destination IP address

Payload

Options

IP

TCP

HowtolinkTCPsubflows?SYN,Portsrc=1234,Portdst=80+OpKon

SYN+ACK[...]

ACK

SYN,Portsrc=1235,Portdst=80+OpKon[linkPortsrc=1234,Portdst=80]

ANATcouldchangeaddressesandportnumbers

HowtolinkTCPsubflows?SYN,Portsrc=1234,Portdst=80+OpKon[Token=5678]

SYN+ACK+OpKon[Token=6543]ACK

SYN,Portsrc=1235,Portdst=80+OpKon[Token=6543]

MyToken=5678YourToken=6543

MyToken=6543YourToken=5678

TCPsubflows

•  WhichsubflowscanbeassociatedtoaMulKpathTCPconnecKon?– Atleastoneoftheelementsofthefour-tupleneedstodifferbetweentwosubflows

•  LocalIPaddress•  RemoteIPaddress•  Localport•  Remoteport

TCPsubflowsinpracKce

•  MulKpathTCPsupportssubflowagility– Client/servercanaddsubflowsatanyKme– Client/servercanremovesubflowsatanyKme

TheMulKpathTCPprotocol

•  Controlplane– HowtomanageaMulKpathTCPconnecKonthatusesseveralpaths?

•  Dataplane– Howtotransportdata?

•  CongesKoncontrol– HowtocontrolcongesKonovermulKplepaths?

Howtotransferdata?seq=123,"a"

seq=124,"b"

seq=125,"c"

seq=126,"d"

ack=124

ack=126

ack=125

ack=127

Howtotransferdataintoday'sInternet?

seq=123,"a"

seq=124,"b"

seq=125,"c"ack=124

ack=126

ack=125

GapinsequencenumberingspaceSomeDPIwillnotallowthis!

MulKpathTCPDatatransfer

•  Twolevelsofsequencenumbers

MulKpathTCP

TCP1

socket

TCP2

MulKpathTCP

TCP1

socket

TCP2

ABCDEF

Datasequence#

TCP1sequence#

TCP2sequence#

MulKpathTCPDatatransfer

Dseq=0,seq=123,"a"

DSeq=1,seq=456,"b"

DSeq=2,seq=124,"c"DAck=1,ack=124

DAck=3,ack=125

DAck=2,ack=457

MulKpathTCPHowtodealwithlosses?

•  DatalossesoveroneTCPsubflow– FastretransmitandKmeoutasinregularTCP

Dseq=0,seq=123,"a"

DAck=1,ack=124Dseq=0,seq=123,"a"

DAck=1,ack=124

MulKpathTCP

•  WhathappenswhenaTCPsubflowfails?Dseq=0,seq=123,"a"

DSeq=1,seq=456,"b"DAck=0,ack=457

Dseq=0,seq=457,"a"

DAck=2,ack=458

RetransmissionheurisKcs

•  HeurisKcsusedbycurrentLinuximplementaKon–  Fastretransmitisperformedonthesamesubflowastheoriginaltransmission

– UponKmeoutexpiraKon,reevaluatewhetherthesegmentcouldberetransmi2edoveranothersubflow

– Uponlossofasubflow,alltheunacknowledgeddataareretransmi2edonothersubflows

Flowcontrol

•  Howshouldthewindow-basedflowcontrolbeperformed?–  IndependantwindowsoneachTCPsubflow

– AsinglewindowthatissharedamongallTCPsubflows

Independantwindows

Dseq=0,seq=123,"a"

DSeq=1,seq=456,"b"DAck=2,ack=457,win=100

Dseq=2,seq=457,"c"

DAck=3,ack=458,win=100

DAck=1,ack=124,win=0

Independantwindowspossibleproblem

•  Impossibletoretransmit,windowisalreadyfullongreensubflow

Dseq=0,seq=123,"a"

DSeq=1,seq=456,"b"DAck=2,ack=457,win=0

Asinglewindowsharedbyallsubflows

Dseq=0,seq=123,"a"

DSeq=1,seq=456,"b"DAck=2,ack=457,win=10

Dseq=2,seq=457,"c"

DAck=3,ack=458,win=10

DAck=1,ack=124,win=10

AsinglewindowsharedbyallsubflowsImpactofmiddleboxes

Dseq=0,seq=123,"a"

DSeq=1,seq=456,"b"DAck=2,ack=457,win=100

DAck=1,ack=124,win=100

DAck=2,ack=457,win=5

MulKpathTCPWindows

•  MulKpathTCPmaintainsonewindowperMulKpathTCPconnecKon– WindowisrelaKvetothelastackeddata(DataAck)– Windowissharedamongallsubflows

•  It'suptotheimplementaKontodecidehowthewindowisshared– Windowistransmi2edinsidethewindowfieldoftheregularTCPheader

–  Ifmiddleboxeschangewindow field,•  uselargestwindowreceivedatMPTCP-level•  usereceivedwindowovereachsubflowtocopewiththeflowcontrolimposedbythemiddlebox

MulKpathTCPbuffers

MulKpathTCP

TCP1

socket

TCP2

Scheduler

Transmitqueues,processonlyregular

TCPheader

Reorderqueue,processesonlyTCPheader

MPTCP-level,resequencingpossible

send(...)recv(...)

SendingMulKpathTCPinformaKon

•  HowtoexchangetheMulKpathTCPspecificinformaKonbetweentwohosts?

•  OpKon1– UseTLVstoencodedataandcontrolinformaKoninsidepayloadofsubflows

•  Op0on2– UseTCPopKonstoencodeallMulKpathTCPinformaKon

OpKon1:MichaelScharf,Thomas-RolfBanniza,MCTCP:AMul=pathTransportShimLayer,GLOBECOM2011

MulKpathTCPwithonlyopKons

•  Advantages–  NormalwayofextendingTCP

–  Shouldbeabletogothroughmiddleboxesorfallback

•  Drawbacks–  limitedsizeoftheTCPopKons,notablyinsideSYN

– WhathappenswhenmiddleboxesdropTCPopKonsindatasegments

MulKpathTCPusingTLV

•  Advantages– MulKpathTCPcouldstartasregularTCPandmovetoMulKpathonlywhenneeded

–  Couldbeimplementedasalibraryinuserspace

–  TLVscanbeeasilyextended

•  Drawbacks–  TCPsegmentscontainTLVsincludingthedataandnotonlythedata

•  problemformiddleboxes,DPI,..

– Middleboxesbecomemoredifficult

MichaelScharf,Thomas-RolfBanniza,MCTCP:AMul=pathTransportShimLayer,GLOBECOM2011

© O. Bonaventure, 2011

IsitsafetouseTCPopKons?

•  KnownopKon(TS)inDatasegments

XD6BHM

Honda,Michio,etal."IsitsKllpossibletoextendTCP?."Proceedingsofthe2011ACMSIGCOMMconferenceonInternetmeasurementconference.ACM,2011.

© O. Bonaventure, 2011

IsitsafetouseTCPopKons?

•  UnknownopKoninDatasegments

XD6BHM

Honda,Michio,etal."IsitsKllpossibletoextendTCP?."Proceedingsofthe2011ACMSIGCOMMconferenceonInternetmeasurementconference.ACM,2011.

MulKpathTCPopKons

•  TCPopKonformat

•  IniKaldesign

– OneopKonkindforeachpurpose(e.g.DataSequencenumber)

•  Finaldesign– Asinglevariable-lengthMulKpathTCPopKon

Kind Length OpKon-specificdata

MulKpathTCPopKon

•  AsingleopKontype–  tominimisetheriskofhavingoneopKonacceptedbymiddleboxesinSYNsegmentsandrejectedinsegmentscarryingdata

Subtype Kind Length

Subtype specific data(variable length)

DatasequencenumbersandTCPsegments

•  HowtotransportDatasequencenumbers?– SamesoluKonasforTCP

•  DatasequencenumberinTCPopKonistheDatasequencenumberofthefirstbyteofthesegment

Source port Destination port

Checksum Urgent pointer

THL Reserved Flags

Acknowledgment number

Sequence number

Window

Payload

Datasequence number

MulKpathTCPDatatransfer

Dseq=0,seq=123,"a"

DSeq=1,seq=456,"b"

DSeq=2,seq=124,"c"DAck=1,ack=124

DAck=3,ack=125

DAck=2,ack=457

WhichmiddleboxeschangeTCPsequencenumbers?

•  SomefirewallschangeTCPsequencenumbersinSYNsegmentstoensurerandomness– fixforoldwindows95bug

•  TransparentproxiesterminateTCPconnecKons

Middleboxinterference

•  Datasegments

Data,seq=12,"ab"

Data,seq=14,"cd"Data,seq=12,"abcd"

SuchamiddleboxcouldalsobethenetworkadapteroftheserverthatusesLROtoimproveperformance.

© O. Bonaventure, 2011

Segmentcoalescing

Honda,Michio,etal."IsitsKllpossibletoextendTCP?."Proceedingsofthe2011ACMSIGCOMMconferenceonInternetmeasurementconference.ACM,2011.

Datasequencenumbersandmiddleboxes

seq=123,Dseq=0,"a"

seq=456,DSeq=1,"b"

seq=124,DSeq=2,"c" seq=123,DSeq=2,"ac"

copiesoneopKonincoalescedsegment

bufferssmallsegments

seq=123,DSeq=0,"ac"

Datasequencenumbersandmiddleboxes

seq=123,Dseq=0,"ab"

DSeq=0,seq=123,"a"

DSeq=0,seq=124,"b"MiddleboxonlyunderstandsregularTCP

A"middlebox"thatbothsplitsandcoalescesTCPsegments

Datasequencenumbersandmiddleboxes

•  HowtoavoiddesynchronisaKonbetweenthebytestreamanddatasequencenumbers?

•  SoluKon– MulKpathTCPopKoncarriesmappingbetweenDatasequencenumbersand(differencebetweenini=alandcurrent)subflowsequencenumbers

•  mappingcoversapartofthebytestream(length)

MulKpathTCPDatatransfer

seq=123,DSS[0->123,len=1],"a"

seq=456,DSS[1->456,len=1],"b"

seq=124,DSS[2->124,len=1],"c"DAck=1,ack=124

DAck=3,ack=125

DAck=2,ack=457

Datasequencenumbersandmiddleboxes

seq=123,DSS[0->123,len=1],"a"

seq=456,DSS[1->456,len=1],"b"

seq=124,DSS[2->124,len=1],"c"

seq=123,DSS[0->123,len=1],"ac"

DAck=2,ack=125

DSeq=0,ack=457

seq=125,DSS[2->125,len=1],"c"

Datasequencenumbersandmiddleboxes

seq=123,DSS[0->123,len=1],"a"

seq=456,DSS[1->456,len=1],"b"

seq=124,DSS[2->124,len=1],"c"

seq=123,DSS[2->124,len=1],"ac"DAck=0,ack=125

seq=125,DSS[0->125,len=1],"a"

DAck=3,ack=126

MulKpathTCPandmiddleboxes

•  WiththeDSSmapping,MulKpathTCPcancopewithmiddleboxesthat– combinesegments– splitsegments

•  AretheythemostannoyingmiddleboxesforMulKpathTCP?

– Unfortunatelynot

© O. Bonaventure, 2011

TCPsequencenumberandmiddleboxes

Honda,Michio,etal."IsitsKllpossibletoextendTCP?."Proceedingsofthe2011ACMSIGCOMMconferenceonInternetmeasurementconference.ACM,2011.

Theworstmiddlebox

•  Isthisanacademicexerciseorreality?

seq=123,DSS[1->123,len=2],"aXXXb"

DAck=3,ack=125

seq=125,DSS[3->125,len=2],"cd"

seq=123,DSS[1->123,len=2],"ab"

DAck=3,ack=128

seq=128,DSS[3->125,len=2],"cd"

Theworstmiddlebox

•  Isunfortunatelyveryold...– AnyALGforaNAT

220ProFTPD1.3.3dServer(BELNETFTPDServer)[193.190.67.15]Rp_login:user`<null>'pass`<null>'host`Rp.belnet.be'Name(Rp.belnet.be:obo):anonymous--->USERanonymous331Anonymousloginok,sendyourcompleteemailaddressasyourpasswordPassword:--->PASSXXXX--->PORT192,168,0,7,195,120200PORTcommandsuccessful--->LIST150OpeningASCIImodedataconnecKonforfilelistlrw-r--r--1RpRp6Jun12011pub->mirror226Transfercomplete

Copingwiththeworstmiddlebox

•  WhatshouldMulKpathTCPdointhepresenceofsuchaworstmiddlebox?– Donothingandignorethemiddlebox

•  butthenthebytestreamandtheapplicaKonwouldbebrokenandthisproblemwillbedifficulttodebugbynetworkadministrators

– Detectthepresenceofthemiddlebox•  andfallbacktoregularTCP(i.e.useasinglepathandnothingfancy)

MulKpathTCPMUSTworkinallnetworkswhereregularTCPworks.

DetecKngtheworstmiddlebox?

•  HowcanMulKpathTCPdetectamiddleboxthatmodifiesthebytestreamandinserts/removesbytes?– VarioussoluKonswereexplored

–  Intheend,MulKpathTCPchosetoincludeitsownchecksumtodetectinserKon/deleKonofbytes

Theworstmiddleboxseq=123,DSS[1->123,len=2,Inv],"aXXXb"

seq=123,DSS[1->123,len=2,V],"ab"

RST,lastDSeq=0RST,lastDSeq=0

seq=456,DSS[1->456,len=2,V],"ab"DAck=3,ack=458

MulKpathTCPDatasequencenumbers

•  Whatshouldbethelengthofthedatasequencenumbers?– 32bits

•  compactandcompaKblewithTCP•  wraparoundproblemathighspeedrequiresPAWS

– 64bits•  wraparoundisnotanissueformosttransferstoday•  takesmorespaceinsideeachsegment

MulKpathTCPDatasequencenumbers

•  DatasequencenumbersandDataacknowledgements– MaintainedinsideimplementaKonas64bitsfield

–  ImplementaKonscan,asanopKmisaKon,onlytransmitthelower32bitsofthedatasequenceandacknowledgements

DataSequenceSignalopKon

CumulaKveDataack

A=DataACKpresenta=DataACKis8octetsM=mappingpresentm=DSNis8

Lengthofmapping,canextendbeyondthissegment

ComputedoverdatacoveredbyenKremapping+pseudoheader

TheMulKpathTCPprotocol

•  Controlplane– HowtomanageaMulKpathTCPconnecKonthatusesseveralpaths?

•  Dataplane– Howtotransportdata?

•  CongesKoncontrol– HowtocontrolcongesKonovermulKplepaths?–  CongesKonwindowsonsubflowsMUSTbecoupledtoensurethatTCPremainsfairwithregularTCP

AIMDinTCP

•  CongesKoncontrolmechanism–  Eachhostmaintainsaconges=onwindow(cwnd)– NocongesKon

•  CongesKonavoidance(addi0veincrease)–  increasecwndbyonesegmenteveryround-trip-Kme

–  CongesKon•  TCPdetectscongesKonbydetecKnglosses•  MildcongesKon(fastretransmit–mul0plica0vedecrease)

–  cwnd=cwnd/2andrestartcongesKonavoidance•  SeverecongesKon(Kmeout)

–  cwnd=1,setslow-start-thresholdandrestartslow-start

EvoluKonofthecongesKonwindow

Cwnd Fast retransmit

ThresholdThreshold

Slow-startexponential increase of cwnd

Congestion avoidance linear increase of cwnd

Fast retransmit

Time

CongesKoncontrolforMulKpathTCP

•  Simpleapproach–  independantcongesKonwindows

ThresholdThreshold

Threshold

IndependantcongesKonwindows

•  Problem

12Mbps

CoupledcongesKoncontrol

•  CongesKonwindowsarecoupled– congesKonwindowgrowthcannotbefasterthanTCPwithasingleflow

– CoupledcongesKoncontrolaimsatmovingtrafficawayfromcongestedpath

Agenda

•  Internettransportprotocols– TCP– SCTP

•  MulKpathTCP– Basicprinciples– Usecases

•  What'snext?– QUIC

MulKpathTCPusecasesThebeast

TCPonservers

•  Howtoincreaseserverbandwidth?

•  Loadbalancingtechniques– packetperpacket– perflowloadbalancing

•  eachTCPconnecKonismappedontooneinterface

IncreasingserverbandwidthwithMulKpathTCP

•  LoadbalancingwithMulKpathTCP–  CongesKoncontrolefficientlyusesthetwolinksforeachMPTCPconnecKon

– AutomaKcfailoverincaseoffailures

HowfastcanMulKpathTCPgo?

h2p://linux.slashdot.org/story/13/03/23/0054252/a-50-gbps-connecKon-with-mulKpath-tcp

HowfastcanMulKpathTCPgo?

Datacenters evolve

•  Traditional Topologies are tree-based–  Poor performance– Not fault tolerant

•  Shift towards multipath topologies: FatTree, BCube, VL2, Cisco, EC2

C.Raiciu,etal.“ImprovingdatacenterperformanceandrobustnesswithmulKpathTCP,”ACMSIGCOMM2011.

Fat Tree Topology [Fares et al., 2008; Clos, 1953]

K=4

1Gbps

1Gbps

AggregaKonSwitches

KPodswithKSwitches

each

Racksofservers

Fat Tree Topology [Fares et al., 2008; Clos, 1953]K=4

AggregaKonSwitches

KPodswithKSwitches

each

Racksofservers

C.Raiciu,etal.“ImprovingdatacenterperformanceandrobustnesswithmulKpathTCP,”ACMSIGCOMM2011.

Collisions

TCPindatacenters

TCPinFATtreenetworksCostofcollissions

C.Raiciu,etal.“ImprovingdatacenterperformanceandrobustnesswithmulKpathTCP,”ACMSIGCOMM2011.

0

200

400

600

800

1000

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

Thro

ughp

ut (M

b/s)

Rank of Flow

MPTCPOptimal Throughput

TCP Flow Throughput

Howtogetridofthesecollisions?

•  ConsiderTCPperformanceasanopKmisaKonproblem

C.Raiciu,etal.“ImprovingdatacenterperformanceandrobustnesswithmulKpathTCP,”ACMSIGCOMM2011.

TheMulKpathTCPway

Twosubflowsdifferbytheirsourceport

ECMPbalancesthesubflowsoverdifferentpaths

MPTCPbe2eruKlizestheFatTreenetwork

0

200

400

600

800

1000

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

Thro

ughp

ut (M

b/s)

Rank of Flow

MPTCPOptimal Throughput

TCP Flow Throughput

C.Raiciu,etal.“ImprovingdatacenterperformanceandrobustnesswithmulKpathTCP,”ACMSIGCOMM2011.

SeealsoG.Detal,etal.,Revisi=ngFlow-BasedLoadBalancing:StatelessPathSelec=oninDataCenterNetworks,ComputerNetworks,April2013forextensionstoECMPforMPTCP

HowmanysubflowsdoesMulKpathTCPneed?TotalThroughput

0 10 20 30 40 50 60 70 80 90

100

RLB 2 3 4 5 6 7 8

Thro

ughp

ut (%

of o

ptim

al)

Multipath TCPTCP

C.Raiciu,etal.“ImprovingdatacenterperformanceandrobustnesswithmulKpathTCP,”ACMSIGCOMM2011.

CanweimproveMulKpathTCP?•  Twosubflowsmayfollowsimilarpaths

ImprovingECMP•  ECMP'shash

–  goodloadbalancing–  impossibletopredictresult

•  CFLB–  replaceshashwithblockcipher

–  hostscanselectpathsforMulKpathTCPsubflowsprovidedtheyknowdatacentertopology

G.Detal,Ch.Paasch,S.vanderLinden,P.Mérindol,G.Avoine,O.Bonaventure,Revisi=ngFlow-BasedLoadBalancing:StatelessPathSelec=oninDataCenterNetworks,toappearinComputerNetworks

MulKpathTCPwithCFLBinFat-Tree

G.Detal,Ch.Paasch,S.vanderLinden,P.Mérindol,G.Avoine,O.Bonaventure,Revisi=ngFlow-BasedLoadBalancing:StatelessPathSelec=oninDataCenterNetworks,toappearinComputerNetworks

MulKpathTCPonEC2

•  AmazonEC2:infrastructureasaservice– Wecanborrowvirtualmachinesbythehour–  TheseruninAmazondatacentersworldwide– Wecanbootourownkernel

•  AfewavailabilityzoneshavemulKpathtopologies–  2-8pathsavailablebetweenhostsnotonthesamemachineorinthesamerack

–  AvailableviaECMP

AmazonEC2Experiment

•  40mediumCPUinstancesrunningMPTCP•  During12hours,wesequenKallyranall-to-alliperfcyclingthrough:– TCP– MPTCP(2and4subflows)

MPTCPimprovesperformanceonEC2

SameRack

0 100 200 300 400 500 600 700 800 900

1000

0 500 1000 1500 2000 2500 3000

Thro

ughp

ut (M

b/s)

Flow Rank

TCPMPTCP, 4 subflowsMPTCP, 2 subflows

C.Raiciu,etal.“ImprovingdatacenterperformanceandrobustnesswithmulKpathTCP,”ACMSIGCOMM2011.

MoKvaKon

•  Onedevice,manyIP-enabledinterfaces

sshwithMulKpathTCP

MPTCPoverWiFi/3G

8Mbps,20ms

2Mbps,150ms

TCPoverWiFi/3G

C.Raiciu,etal.“Howhardcanitbe?designingandimplemenKngadeployablemulKpathTCP,”NSDI'12:Proceedingsofthe9thUSENIXconferenceonNetworkedSystemsDesignandImplementaKon,2012.

MPTCPoverWiFi/3G

C.Raiciu,etal.“Howhardcanitbe?designingandimplemenKngadeployablemulKpathTCP,”NSDI'12:Proceedingsofthe9thUSENIXconferenceonNetworkedSystemsDesignandImplementaKon,2012.

MPTCPoverWiFi/3G

MulKpathTCPincreasesthroughput

MPTCPoverWiFi/3G

Whathappenedhere?

Understandingtheperformanceissue

8Mbps,20ms

2Mbps,150ms Window

B

A

CD

Windowfull!NonewdatacanbesentonWiFipath

A

Reinjectsegmentonfastpath

Halveconges0onwindowonslowsubflow

MPTCPoverWiFi/3G

MulKpathTCPusecasesLowlatencyforSiri

•  Long-livedTLSconnecKons

WiFi

3G/LTE

Voicesamples

Voicesamples

MulKpathTCPusecasesHighbandwidthonsmartphones

•  Koreanswant800+Mbpsonsmartphones

WiFi

4G/LTE

Multipath TCP Regular TCP

SOCKS

Fasterbroadbandnetworks?

MulKpathTCPusecasesHybridAccessNetworks

DSL

4G/LTE

Multipath TCP Regular TCP

Hybrid AccessGateway

TCP

TCP

Agenda

•  Internettransportprotocols– TCP– SCTP

•  MulKpathTCP– Basicprinciples– Usecases

•  What'snext?– QUIC

Issueswiththecurrentstack

PhysicalDatalink

IPv4/IPv6TCP

HTTP1.1

ASCIIdifficulttoparse,nopriority

UnsecureWaitforthreewayhandshakebefore

datatransfer

PhysicalDatalink

IPv4/IPv6TCP

HTTP/2TLS

Secure,Butaddsmoredelay

PhysicalDatalink

IPv4/IPv6UDPQUICFirstbytes

A_er2RTTs

FirstbytesA_er3-4RTTs Firstbytes

A_er0RTT

QUICinanutshell

•  FirstconnecKona2empt

CHLO[SNI,VER]

CHLO[Token,Cryptoinfo]

ServerNameandVersion

Rejected

REJ[Config,Token,CerKficate]

DATA[Encrypted]

SHLO[Config,Token,CerKficate]

DATA[Encrypted]

QUICfeatures

•  CongesKoncontrol– LeveragesTCP'slonghistory(CUBIC)

•  Retransmissions– Be2erthanwithregularTCP– Eachsegmenthasadifferentseqnum

•  AvoidsretransmissionambiguiKes

•  SelecKveacknowledgements– CleanerthaninTCP

QUICusageatgoogle

QUIChandshakesfailwhenRTTsaregreaterthan2.5secondsorwhenUDPisblocked

Source:J.Iyengar,QUICOverview,IETF93,July2015,Prague

QUICReducingdelays

TCP TCP + TLS QUIC (equivalent to TCP + TLS)

Source:J.Iyengar,QUICOverview,IETF93,July2015,Prague

WhyrunningQUICoverUDP?

•  Simplesttransportprotocol–  SupportedcorrectlybyalloperaKngsystems–  Supportedcorrectlybyallmiddleboxes

•  ApplicaKoncanenKrelycontroleverything–  SameversionofQUICrunsonallpla�orms– QUICcanbeupgradedasfrequentlyastheapplicaKon– ApplicaKondeveloperdoesnotneedtocoordinatewithIETForanyone

Howtocopewithmiddleboxes?

•  VeryfewmiddleboxesinterferewithUDP– SomemiddleboxesdropUDPsegments

•  ApplicaKonswilldetectandfallbacktoTCP– SomemiddleboxesratelimitUDP

•  ApplicaKonswilldetectandfallbacktoTCP

•  WhataboutmiddleboxesopKmisingQUIC/UDP– Nightmareforgoogle– EverythinginQUIC(payloadandheaders)isencrypted

TFO:AFasterTCP

•  Simpleidea:senddatainSYNsegments– ModernversionofT/TCP

SYN(Src=C,seq=x, HTTP GET)‏HTTP GET

SYN+ACK(Dest=C,ack=x+1,seq=y, HTTP Resp)‏

ACK(Src=A,seq=x)‏

Internettransportlayer•  SKlllotsofinnovaKonforanoldlayer…

–  TCPextensions•  IniKalwindow,TCPFastOpen,…

– MulKpathTCPisge�ngdeployed•  RFC6824waspublishedinJanuary2013

–  ButMiddleboxeshaveossifiedtheInternet

•  Otherprotocols–  QUIC

•  PushedbygoogleforwebapplicaKons–  TCPINC

•  SupportencrypKoninsidetransportlayer–  TLS1.3

•  Fasterhandshakeandlowerdelays

top related