the transport layer: tcp and udp - moodle 2017-2018 · the transport layer: tcp and udp jean‐yves...

59
1 The Transport Layer: TCP and UDP Jean‐Yves Le Boudec 2014 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE

Upload: others

Post on 22-Mar-2020

23 views

Category:

Documents


0 download

TRANSCRIPT

1

The Transport Layer: TCP and UDPJean‐YvesLeBoudec

2014

ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE

Contents

1. Thetransportlayer,UDP2. TCPBasics:SlidingWindowandFlow

Control3. TCPConnectionsandSockets4. MoreTCPBellsandWhistles

5. Whereshouldpacketlossesberepaired?

2

TextbookChapter4:TheTransportLayer

1. The Transport Layer

Reminder:network+link+phy carrypacketsend‐to‐endtransportlayermakesnetworkservicesavailabletoprogramsisinendsystemsonly,notinrouters

In TCP/IP therearemainly two transportlayersUDP(UserDatagramProtocol)TCP(TransmissionControlProtocol): UDP+errorrecovery+flowcontrol

ThereisnoTCPv6norUDPv6,thesameTCPandUDPareusedoverIPv4andIPv6

3

4

UDP Uses Port Numbers

Host IP addr=B

Host IP addr=A

IP SA=A DA=B prot=UDPsource port=1267destination port=53…data…

processsa

processra

UDP

processqa

processpa

TCP

IP

1267

processsb

processrb

UDP

processqb

processpb

TCP

IP

53

IP network

UDP Source Port UDP Dest Port UDP Message Length UDP Checksum

data

IP header

UDP datagramIP datagram

5

Thepictureshowstwoprocesses(=applicationprograms)pa,andpb,arecommunicating.Eachofthemisassociatedlocallywithaport,asshowninthefigure.

TheexampleshowsapacketsentbythenameresolverprocessathostA,tothenameserverprocessathostB.TheUDPheadercontainsthesourceanddestinationports.ThedestinationportnumberisusedtocontactthenameserverprocessatB;thesourceportisnotuseddirectly;itwillbeusedintheresponsefromBtoA.

TheUDPheaderalsocontainsachecksumtheprotecttheUDPdataplustheIPaddressesandpacketlength.Checksumcomputationisnotperformedbyallsystems.Portsare16bitsunsignedintegers.Theyaredefinedstaticallyordynamically.Typically,aserverusesaportnumberdefinedstatically.

Standardservicesusewell‐knownports;forexample,allDNSserversuseport53(lookat/etc/services).Portsthatareallocateddynamicallyarecalledephemeral.Theyareusuallyabove1024.Ifyouwriteyourownclientserverapplicationonamultiprogrammingmachine,youneedtodefineyourownserverportnumberandcodeitintoyourapplication.

The UDP service is message oriented

UDPserviceinterfaceonemessage,upto65,535bytesdestinationaddress,destinationport,sourceaddress,sourceportdestinationaddresscanbeunicastormulticast

UDPserviceismessageorientedUDPdeliversexactlythemessage(called“Datagram”)ornothingconsecutivemessagesmayarriveindisordermessagemaybelost‐‐ applicationmusthandle

IfaUDPmessageislargerthanthepossiblemaximumsizefortheIPlayer,MTU,thenfragmentationoccursattheIPlayer–thisisnotvisibletotheapplicationprogram

6

UDP is used via a Socket LibraryThesocketlibraryprovidesaprogramminginterfacetoTCPandUDPThefigureshowstoyclientandserverUDPprograms.Theclientsendsonestringofcharstotheserver,whichsimplyreceives(anddisplays)it.

socket(AF_INET,…)createsanIPv4socketandreturnsanumber(=filedescriptor)ifsuccessful;socket(AF_INET6,…)createsanIPv6socketbind()associatesthelocalportnumberwiththesocketsendto()givesthedestinationIPaddress,portnumberandthemessagetosendrecvFrom()blocksuntilonemessageisreceivedforthisportnumber.ItreturnsthesourceIPaddressandportnumberandthemessage.

7

client

socket();

bind();

sendto();

close();

server

socket();

bind();

rcvfrom();

% ./udpClient <destAddr> bonjour les amis%

% ./udpServ &%

Is there a UDPv6 ?ThereisnoUDPv6(norTCPv6),astheUDPandTCPprotocolsarenotaffectedbythechoiceofIPv4orIPv6

However,thereareUDPv4socketsandUDPv6sockets,i.e.theserviceinterfacesareaffected.

AnapplicationprogrammustdecidewhethertouseaUDPv4orUDPv6socket;

inprinciple,itusesDNStoknowwhatisavailable;ifbothIPv4andIPv6areavailable,IPv6ispreferred

8

socket(AF_INET,…)or

socket(AF_INET6…)

9

How the Operating System views UDP

id=3 id=4

buffer buffer

port=32456 port=32654

Application program

UDP

IPaddress=128.178.151.84

IPv4socket

IPv4socket

id=5

IPv6socket

IPv6 packet IPv4 packets

port=32456

UDP SDUs

address=2001:620:618:1a6:3:80b2:9754:1

How the Operating System views UDP

Onthesendingside:OperatingSystemsendstheUDPdatagramassoonaspossibleOnthereceivingside:OperatingSystemre‐assemblesUDPdatagram(ifrequired)andkeepstheminbufferreadytobereadIPv6socketsareinadifferentspacethan IPv4sockets

10

Lisa’s browser sends DNS query to DNS server, over UDP. What happens if query or answer is 

lost ?

1. Nameresolverinbrowserwaitsfortimeout,ifnoanswerreceivedbeforetimeout,sendsagain

2. MessagescannotbelostbecauseUDPassuresmessageintegrity

3. Noneoftheabove

11

Name resolve

r in brow

se.

Messages cannot b

e lost.

None of the ab

ove

0%0%0%

2. TCP Basics: Sliding Window and Flow Control

IntheInternet,packetsmaybelostbufferoverflowphysicallayererrors

UDPapplicationmusthandlelossTCPsolves theproblem onceforall

12

TCP offers in‐sequence, lossless delivery

WhatdoesTCPdo?TCPguaranteesthatalldataisdeliveredinsequenceandwithoutloss,unlesstheconnectionisbroken;

HowdoesTCPwork?dataisnumbered(sequencenumbers)aconnection(=synchronizationofsequencenumbers)isopenedbetweentwoprocessesdataisnumberedandTCPwaitsforacknowledgements;ifmissingdataisdetected,TCPre‐transmits

13

14

TCP Basic Operation 1: showing SEQ and ACKseq 8001:8501

ack 8501seq 8501:9001

seq 9001:9501

seq 9501:10001

seq 8501:9001ack 8501

ack 9001

seq 9001:9501

Timeout!

1

2

3

4

56

7

8

9

deliverbytes

8001:8501

deliverbytes

8501:9001

deliverbytes

9001:10001

A B

10

ThepreviousslideshowsAintheroleofsenderandBofreceiver.TheapplicationatAsendsdatainblocksof500bytes.Themaximumsegmentsizeis1000bytes.Rangessuchas8001:8501meanbytesnumbers8001to8500.Packets 3,4and7arelost.BreturnsanacknowledgementintheACKfield.TheACKfieldiscumulative,soACK8501means:Bisacknowledgingallbytesupto(excluding)number8501.Atline8,thetimerthatwassetatline3expires(Ahasnotreceivedanyacknowledgementforthebytesinthepacketsentatline3).Are‐sendsdatathatisdetectedaslost,i.e.bytes8501:9001.Whenreceivingpacket8,Bcandelivertotheapplicationallbytes8501:9001.Whenreceivingpacket10,Bcandeliverbytes9001:10001becausepacket5wasreceivedandkeptbyBinthereceivebuffer.

15

16

TCP Basic Operation 1: showing SEQ, ACK and SACKseq 8001:8501

ack 8501seq 8501:9001

seq 9001:9501

seq 9501:10001

seq 8501:9501ack 8501sack(9501:10001)

ack 10001

seq 10001:10501

1

2

3

4

56

7

8

9

deliverbytes

8001:8501

deliverbytes

8501:10001

deliverbytes

10001:10501

A B

10TcpMaxDupACKs set to 1 at A

InadditiontotheACKfield,mostTCPimplementationalsousetheSACKfield(SelectiveAcknowledgement). ThepreviousslideshowstheoperationofTCPwithSACK.TheapplicationatAsendsdatainblocksof500bytes.Themaximumsegmentsizeis1000bytes.Packets3and4arelost.Atline6,Bisacknowledgesallbytesupto(excluding)number8501.Atline7,Backnowledgesallbytesupto8501andintherange9501:10001.Sincethesetofacknowledgedbytesisnotcontiguous,theSACKoptionisused.Itcontainsupto3blocksthatareacknowledgedinadditiontotherangedescribedbytheACKfield.Atline8,Adetectsthatthebytes8501:9501 werelostandre‐sendsthem.Sincethemaximumsegmentsizeis1000bytes,onlyonepacketissent.Whenreceivingpacket8,Bcandeliverbytes9001:10001becausepacket5wasreceivedandkeptinthereceivebuffer.

17

TCP receiver uses a receive buffer = re‐sequencing bufferto store incoming packets before delivering them to 

applicationWhyinvented?

ApplicationmaynotbereadytoconsumedataPacketsmayneedre‐sequencing;out‐of‐sequencedataisstoredbutisnotvisibletoapplication

18

8001:8501

9501:10001

8001:10001

8001:8501

Can be read (received)by app

Invisible to app(cannot be read)

TCP uses a sliding window

Thereceivebuffermayoverflowifonepieceofdata“hangs”

E.g.multiplelossesaffectingthesamepacket

Thisiswhytheslidingwindowwasinvented

limitsthenumberofdata“onthefly”

19

P0

A1

P1

P2

A2

PnP0 again

Pn+1

P1

P1 P2

P1 P2 ... Pn

P1 P2 ... Pn+1

ReceiveBuffer

How the sliding window works

loweredge=smallestnonacknowledgedsequencenumber

upperedge=loweredgewindowsize

Onlysequencenumbersthatareinthewindowmaybesent

20

Window size = 4’000 bytes; one packet is 1’000 bytes

Window Usable part of the window

At time  , sender…

1. …cansendpacket42. …cannotsendpacket43. Itdependsonwhetherdata

wasconsumedbyapplication4. Idon’tknow

21

Windowsize=4’000bytes,onepacket=1’000bytesSlidingwindowwasinitializedattime

… can send

 packe

t 4… cann

ot se

nd packet 4

It de

pend

s on whe

ther da.

I don

’t k n

ow

0% 0%0%0%

Sliding Window is not sufficient to limit buffer size at receiver

Datathatisreceivedin‐sequenceremainsinreceivebufferuntilconsumedbyapplication(typicallyusingasocket“read”or“receive”)Aslowapplicationcouldcausebufferoverflow

23

Applicationreadsreceivebuffer

Applicationreads

Window size = 4’000 bytesOne packet =          = 1’000 bytes

24

Window Flow Control is used to prevent receive buffer overflow

TCPconstantlyadaptsthesizeofthewindowbysending“window”advertisementsbacktothesource.

WindowsizeissettoavailablebuffersizeIfnospaceinbuffer,windowsizeissetto0

25

0 1 2 3 4 5 6 7 8 9 10 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12

S=1

ack =‐1,window =2

S=0

S=2S=3

S=4

ack =0,window =2

S=5

S=6

0 1 2 3 4 5 6 7 8 9 10 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12

ack =2,window =4

0 1 2 3 4 5 6 7 8 9 10 11 12

ack =0,window =4

0 1 2 3 4 5 6 7 8 9 10 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12

ack =4,window =2

0 1 2 3 4 5 6 7 8 9 10 11 120 1 2 3 4 5 6 7 8 9 10 11 12 ack =6,window =0

0 1 2 3 4 5 6 7 8 9 10 11 12ack =6,window =4

0 1 2 3 4 5 6 7 8 9 10 11 12

0 1 2 3 4 5 6 7 8 9 10 11 12

S=7

1 unit of data = 1’000 bytes1 packet =  1’000 bytes

1unitofdata=1’000bytes1packet=1’000bytes 2626

ack =4,window =2

S = 1

ack =‐1,window =2

S = 0

S = 2

S = 3

S = 4

ack =0,window =2

S = 5

S = 6

ack =2,window =4

ack =0,window =4

ack =6,window =0ack =6,window =4

S = 7

3 4 5 6

5 6

7 8 9 10

1 2 3 4

1 2 3 4

1 2 3 4

1 2 3 4

3 4 5 6

3 4 5 6

3 4 5 6

7 8 9 10

0 1

0 1

1 2

-2 -1-3

-2 -1-3

-2 -1 0 -3

-2 -1 0 -3

-2 -1 0 1-3

-2 -1 0 1-3 2

-2 -1 0 1-3 2

-2 -1 0 1-3 2 3

-2 -1 0 1-3 2 3 4

-2 -1 0 1-3 2 3 4 5

-2 -1 0 1-3 2 3 4 5 6

-2 -1 0 1-3 2 3 4 5 6

freebuffer

dataacked butnotyet consumed

s.read()

s.read()

s.read()

s.read()

27

Window Flow Control is used to prevent receive buffer overflow

TCPconstantlyadaptsthesizeofthewindowbysending“window”advertisementsbacktothesource.

WindowsizeissettoavailablebuffersizeIfnospaceinbuffer,windowsizeissetto0

Thisiscalled“FlowControl”=adaptsendingrateofsourcetospeedofreceiverCongestionControl(seelater),whichadaptsrateofsourceto

stateofnetwork

28

TCP Basic Operation, Putting Things Together

bytes10001:10500are available

8001:8501(500) ack 101 win 6000

101:201(100) ack 8501 win 40008501:9001(500) ack 201 win 14247

9001:9501(500) ack 201 win 14247

9501:10001(500) ack 201 win 14247

(0) ack 8501 sack 9001:9501 win 4000

8501:9001(500) ack 251 win 14247201:251(50) ack 8501 sack 9001:10001 win 4000

251:401(150) ack 10001 win 2500

10001:10501(500) ack 401 win 14247

1

2

3

4

56

7

8

9

10

bytes...:8500 are available

and consumed

bytes8501:10000 are

available

A B

appconsumes

bytes8501:10000

251:401(150) ack 10001 win 4000

11

29

Thepictureshowsasampleexchangeofmessages.Everypacketcarriesthesequencenumberforthebytesinthepacket;inthereversedirection,packetscontaintheacknowledgementsforthebytesalreadyreceivedinsequence.Theconnectionisbidirectional,withacknowledgementsandsequencenumbersforeachdirection.SohereAandBarebothsendersandreceivers.Acknowledgementsarenotsentinseparatepackets(“piggybacking”),butareintheTCPheader.Everysegmentthuscontainsasequencenumber(foritself),plusanack number(forthereversedirection).Thefollowingnotationisused:

firstByte”:”lastByte+1“(“segmentDataLength”)ack”ackNumber+1“win”offeredWindowSise.Notethe+1withack andlastByte numbers.

Atline8,Aretransmitsthelostdata.Whenpacket8isreceived,theapplicationisnotyetreadytoreadthedata.Later,theapplicationreads(andconsumes)thedata8501:10001.ThisfreessomebufferspaceonthereceivingsideofB;thewindowcannowbeincreasedto4000.Atline10,BsendsanemptyTCPsegmentwiththenewvalueofthewindow.Notethatnumbersonthefigureareroundedforsimplicity.Inrealexampleswearemorelikelytoseenon‐roundnumbers(between0and232 ‐1).Theinitialsequencenumberisnot0,butischosenatrandom.

In the absence of loss, and on a link with 

capacity  packets per second, the window size required for sending at the 

maximum possible rate is…

4. Noneoftheabove5. Idon’tknow

30

time

  

 =

× 

 

 =  

  

 = 

 No

ne of

 the a

bove

I don

’t kn

ow

0% 0%0%0%0%

32

3. TCP Connections and Sockets

TCPrequiresthataconnection(=synchronization)isopenedbeforetransmittingdata

Usedtoagreeonsequencenumbersandmakesurebuffersandwindowareinitiallyempty

ThenextslideshowsthestatesofaTCPconnection:Beforedatatransfertakesplace,theTCPconnectionisopenedusingSYNpackets.Theeffectistosynchronizethecountersonbothsides.Theinitialsequencenumberisarandomnumber.Theconnectioncanbeclosedinanumberofways.Thepictureshowsagracefulreleasewherebothsidesoftheconnectionareclosedinturn.RememberthatTCPconnectionsinvolveonlytwohosts;routersinbetweenarenotinvolved.

Therearemanymoresubtleties(e.g.howtohandleconnectiontermination,lostorduplicatedpacketsduringconnectionsetup,etc—seeTextbooksections4.3.1and4.3.2).

33

TCP Connection Phases

SYN, seq=xsyn_sent

SYN seq=y, ack=x+1

ack=y+1establishedestablished

snc_rcvd

listen

FIN, seq=u

ack=v+1

ack=u+1

FIN seq=vfin_wait_2

time_wait

close_wait

last_ack

closed

application active open passive open

application close:

active closefin_wait_1

Connection

Setup

Data Transfer

Connection

Release

34

flags meaning NS used for explicit congestion notificationCWR used for explicit congestion notificationECN used for explicit congestion notificationurg urgent ptr is validack ack field is validpsh this seg requests a pushrst reset the connectionsyn connection setupfin sender has reached end of byte stream

paddingoptions (SACK, …)

srce port dest port

sequence number

ack number

hlen windowflagsrsvd

urgent pointerchecksum

segment data (if any)

TCPheader(20 Bytes + options)

IP header (20 or 40 B + options)

<= MSS bytes

TCP Segment FormatThepreviousslideshowstheTCPsegmentformat. thepushbitcanbeusedbytheupperlayerusingTCP;itforcesTCPonthesendingsideto

createasegmentimmediately.Ifitisnotset,TCPmaypacktogetherseveralSDUs(=datapassedtoTCPbytheupperlayer)intoonePDU(=segment).Onthereceivingside,thepushbitforcesTCPtodeliverthedataimmediately.Ifitisnotset,TCPmaypacktogetherseveralPDUsintooneSDU.ThisisbecauseofthestreamorientationofTCP.TCPacceptsanddeliverscontiguoussetsofbytes,withoutanystructurevisibletoTCP.ThepushbitusedbyTelnetaftereveryendofline.

theurgentbitindicatesthatthereisurgentdata,pointedtobytheurgentpointer(theurgentdataneednotbeinthesegment).ThereceivingTCPmustinformtheapplicationthatthereisurgentdata.Otherwise,thesegmentsdonotreceiveanyspecialtreatment.ThisisusedbyTelnettosendinterrupttypecommands.

RSTisusedtoindicateaRESETcommand.Itsreceptioncausestheconnectiontobeaborted.

SYNandFINareusedtoindicateconnectionsetupandclose.Theyeachconsumeonesequencenumber.

Thesequencenumberisthatofthefirstbyteinthedata.Theack numberisthenextexpectedsequencenumber.

OptionscontainforexampletheMaximumSegmentSize(MSS)normallyinSYNsegments(negotiationofthemaximumsizefortheconnectionresultsinthesmallestvaluetobeselected)andSACKblocks.

Thechecksumismandatory TheNS,CRWandECNbitsareusedforcongestioncontrol(seecongestioncontrolmodule).

35

TCP Sockets

TCPisusedbymeansofsockets,likeUDPHowever,TCPsocketsaremorecomplicatedbecauseoftheneedtoopen/closeaconnectionOpeningaTCPconnectionrequiresonesidetolisten(thissideiscalled“server”)andonesidetoconnect(thatsideiscalled“client”)At1,clientcanusetheconnectiontosendorreceivedataonthissocket

36

client

s=socket();

server

s1=socket();

bind();

connect();

bind();

listen();

s2=accept();SYN

SYN ACK

ACK1 2

Thefigureshowstoyclientandservers.Theclientsendsastringofcharstotheserverwhichreadsanddisplaysit.socket(AF_INET,…)createsanIPv4socketandreturnsanumber(=filedescriptor)ifsuccessful;socket(AF_INET_6,…)createsanIPv6socketbind()associatesthelocalportnumberwiththesocketconnect()associatestheremoteIPaddressandportnumberwiththesocketandsendsaSYNpacketsend()sendsablockofdatatotheremotedestinationlisten()declaresthesizeofthebufferusedforstoringincomingSYNpackets;accept()blocksuntilaSYNpacketisreceivedforthislocalportnumber.Itcreatesanewsocket(inpink)andreturnsthefiledescriptortobeusedtointeractwiththisnewsocketreceive()blocksuntiloneblockofdataisreadytobeconsumedonthisportnumber.Youmusttellintheargumentofreceivehowmanybytesatmostyouwanttoread.Itreturnsthenumberofbytesthatiseffectivelyreturnedandtheblockofdata.

37

A New Socket is Created by Accept() 

At2,onserverside,anewsocket(s2)iscreated–willbeusedbyservertosend/receivedataThisexampleshowsasimplisticprogram:clientsendsonemessagetoserverandquits;serverhandlesoneclientatatime.

38

12

client

s=socket();

server

s1=socket();

bind(s,…);

connect(s,…);

send(s,…);

close(s);

bind(s1,…);

listen(s1,…);

s2=accept(s1,…);

receive(s2,…);

close(s2);

A More Typical Server

TCPServertypicallyusesparallelthreadsofexecutiontohandleseveralTCPconnections+tolistentoincomingconnections

39

40

How the Operating System views TCP Sockets

TCP

IP

id=3 id=4

buffer

port=32456

address=128.178.151.84

id=5

buffer

IPv4socket

IPv4socket

IPv4socket

Application program

id=6

port=32456

IPv6socket

IPv6socket

address=2001:620:618:1a6:3:80b2:9754:1

id=7

IPv6 packetsIPv4 packets

Connectionrequests

Appdata

Appdata

Appdata

Connectionrequests

41

TCP views data as a stream of bytes

TCPuses portnumberslikeUDPeg.TCPport80isusedforwebserver.TCPrequiresthataconnectionisopenedbeforedatacanbetransferred.ATCPconnectionisidentifiedby:srce IPaddr,srce port,dest IPaddr,destport

TCP‐PDUsarecalledTCPsegmentsbytesaccumulatedinbufferuntilsendingTCPdecidestocreateasegmentMSS=maximum“segment“size(maximumdatapartsize)536bytesbydefaultforIPv4operation(576bytesIPv4packet),1220bydefaultforIPv6operation(1280bytesIPv6packets)

SequencenumberscountsnumberofbytesTCPbuildssegmentsindependentofhowapplicationdataisbroken

unlikeUDPseelab2foranexample

TCPsegmentsshouldnotbefragmentedatsource

TCP dataTCP hdr

IP data = TCP segmentIP hdr

prot=TCP

A TCP server is, by definition…

1. …anapplicationprogramthatdoeslisten()andaccept()onaTCPsocket

2. …anapplicationprogramthatdoesreceive()onaTCPsocket

3. …anapplicationprogramthatdoessend()onaTCPsocket

4. Idon’tknow

42

… an

 appli

catio

n pro

gra...

… an

 appli

catio

n pro

gra...

… an

 appli

catio

n pro

gra...

I don

’t kno

w

0% 0%0%0%

Why both TCP and UDP ?

MostapplicationsuseTCPratherthanUDP,asthisavoidsre‐inventingerrorrecoveryineveryapplicationButsomeapplicationsdonotneederrorrecoveryinthewayTCPdoesit(i.e.bypacketretransmission)

Forexample:Voiceapplications/PMUstreamingQ.why?

Forexample:anapplicationthatsendsjustonemessage,likenameresolution(DNS).Q.Why?

Forexample:multicast(TCPdoesnotsupportmulticastIPaddresses)

43

4. More TCP Bells and Whistles

TCPaddressesalargenumberofsmallissues.Forexample,problemstobesolvedare

Whentosendapacket(applicationmaywrite1byteintothesocket;shouldTCPmakeonepacket?)‐>Nagle’salgorithmWhentosendanACKwhenthereisnodatatosendinreturn?Whentoincreasethewindowsize(sillywindowsyndromeavoidance)?Howtochooseinitialsequencenumbers(e.g.SYNcookiestoavoiddenialofserviceattackbySYNflooding)Whentoretransmitapacket?

Wewillseeonlythelastoneindetail;seetextbooksection4.3.3fortheonewedon’tseehere.

45

We could say that TCP declares a packet lost when a duplicate ACK is received with a SACK field. Is it a good 

idea ?1. Yesbecauseitislikely

thatthereissomemissingdate

2. Noasitmaycausesuperfluousretransmissions(somedatacouldsimplybelate‐‐ outoforder)

3. NobecauseanACKalsocouldbelost

4. Idon’tknow.

46

Yes b

ecause it is likely tha

No as it m

ay cause super

No because an ACK also .

I don’ t know.

0% 0%0%0%

Fast Retransmit

Principle:when duplicateACKsarereceived,declarealoss(DuplicateACK=aTCPpacketwheretheACKvaluerepeatsapreviouslyreceivedACKvalue)

ThelostdataisinferredfromtheSACKblocksTcpMaxDupACKs issetbytheOperatingSystem(often2or

3)

47

P1 P2 P3 P4

ack=1000ack=2000

P5 P6retransmit

P3 P7

ack=2000sack =3000:4000

ack=2000sack =3000:5000

ack=2000sack =3000:6000

1 2 3 4 5

ack =?

6all segments are 1000 bytes; TcpMaxDupACKs = 3

Loss Detection in TCP also uses timers

“Fastretransmit”detectsmostlossesbutnotallburstsoflossesarenotdetectedisolatedpacketsthatarelostarenotdetected

TCPalsousesaretransmittimer,setforeverypackettransmission

whenonetimerexpiresthisisinterpretedasasevereloss(lossofchannel).Alltimersareresetandalldataismarkedasneedingretransmission.

48

49

Round Trip EstimationWhy ?Theretransmissiontimermustbesetatavalueslightlylargerthantheroundtriptime,buttoomuchlargerWhat ? RTTestimationcomputesanupperboundRTOontheroundtriptimeHow ?

≔ ≔

≔ ℓ ,

50

Sample RTO

0246

8101214

1 11 21 31 41 51 61 71 81 91 101

111

121

131

141

seconds

seconds

RTO

SampledRTT

At 8, the loss was detected …

51

1. Byexpirationofretransmittimesetat3

2. Byfastretransmit(withTcpMaxDupACKs=1)

3. Byfastretransmit(withTcpMaxDupACKs=2)

4. Either1or25. Either1or36. Either2or37. Either1,2or38. Noneoftheabove9. Idon’tknow

By ex

piratio

n of re

trans

mit ti..

By fa

st re

trans

mit (w

ith Tcp

M...

By fa

st re

trans

mit (w

ith Tcp

M...

Either 1 or 2

Either 1 or 3

Either 2 or 3

Either 1, 2

 or 3

None

 of the

 abov

eI d

on’t kn

ow

0% 0% 0% 0%0%0%0%0%0%

5. Error Recovery

Wehaveseenhow TCPrepairslossesWenowdiscusswhythisisso,andsometimeswhyitisnotso

53

The Philosophy of Errors in a Layered Model

Thephysicallayerisnotcompletelyerror‐free– thereisalwayssomebiterrorrate(BER).

InformationtheorytellsusthatforeverychannelthereisacapacityCsuchthatAtanyrateR<C,arbitrarilysmallBERcanbeachievedAtratesR¸ C,anyBERsuchthatH2(BER)>1– C/Risachievable,withH2(p)=entropy=– plog2(p)– (1– p)log2(1– p)

TheTCP/IParchitecturedecidedEverylayer¸ 2offersanerrorfree servicetotheupperlayer:

SDUsareeitherdeliveredwithouterrorordiscarded

Example:MAClayerQ1. HowdoesanEthernetadapterknowwhetherareceivedEthernetframeshassomebiterrors?Whatdoesitdowiththeframe?WiFi detectserrorswithCRCanddoesretransmissions ifneededQ2.WhydoesnotEthernetdothesame?

54

The Philosophy of Errors in a Layered Model

Example:MAClayerQ1. HowdoesanEthernetadapterknowwhetherareceivedEthernetframeshassomebiterrors?Whatdoesitdowiththeframe?A1. ItcheckstheCRC.Ifthereisanerror,theframeisdiscardedWiFi detectserrorswithCRCanddoesretransmissions ifneededQ2.WhydoesnotEthernetdothesame?A2. BERisverysmalloncabledsystems,notonwireless

55

The Layered Model Transforms Errors into Packet Losses

PacketlossesoccurduetoerrordetectionbyMACbuffer overflowinbridgesandroutersOtherexceptionalerrorsmayoccurtoo

Therefore,packetlossesmustberepaired.

Thiscanbedoneeitherendtoend :hostAsends10packetstohostB.BverifiesifallpacketsarereceivedandasksforAtosendagainthemissingones.orhopbyhop

56

A R1 R2 BP1

P1

P1P2

P2

P2P3

P4

P4

P4P3 is missing

P3

P3

P3

A R1 R2 BP1

P1

P1P2

P2

P2P3

P3

P3

P3

P3 is missing

P4

P4

P4

57

Theend‐to‐endphilosophyoftheinternetsays:keepintermediatesystemsassimpleaspossibleIPpacketsmayfollowparallelpaths,thisisincompatiblewithhop‐by‐hoprecovery.

R2seesonly3outof7packetsbutshouldnotaskR1forre‐transmisison

The Case for End‐to‐end Error Recovery

R2 BA

R3

R4R1

14 23567

The Case for Hop‐By‐Hop Error Recovery

Therearealsoargumentsinfavour ofhop‐by‐hopstrategy.Tounderstandthem,wewillusethefollowingresult.

Capacityoferasurechannel:considerachannelwithbitrateRthateitherdeliverscorrectpacketsorlosesthem.Assumethelossprocessisstationary,suchthatthepacketlossrateisp [0,1].Thecapacityis

packets/sec.

Thismeansinpracticethat,forexample,overalinkat10Mb/sthathasapacketlossrateof10%wecantransmit9Mb/sofusefuldata.

58

The Capacity of the End‐to‐End Path

Wecannowcomputethecapacityofanend‐to‐endpathwithbotherrorrecoverystrategies.

Assumptions:samepacketlossrateponklinks;samenominalrateR.Lossesareindependent.Q. computethecapacitywithend‐to‐endandwithhopbyhoperrorrecovery.

59

The capacity with hop‐by‐hop error recovery is …

60

4. Idon’tknow 

 1 =

  1−

  

  1 

= 1−

  1 

= 1−

I don

’t kn

ow

25% 25%25%25%

End‐to‐end Error Recovery is Inefficient when Packet Error Rate is high

Q.Howcanonereconciletheconflictingargumentsforandagainsthop‐by‐hoperrorrecovery?

62

k Packetlossrate

C1 (end‐to‐end)

C2 (hop‐by‐hop)

10 0.05 0.6 R 0.95R

10 0.0001 0.9990R 0.9999R

Where is Error Recovery located in the TCP/IP architecture ?

TheTCP/IParchitectureassumesthat1. TheMAClayerprovideserror—free packetstothenetworklayer2. ThepacketlossrateattheMAClayer (betweentworouters,orbetween

arouterandahost)mustbemadeverysmall.ItisthejoboftheMAClayertoachievethis.

3. Errorrecoverymustalsobeimplementedend‐to‐end.

Thus,packetlossesarerepairedAttheMAClayeronlossy channels(wireless)WiFi repairslosseswitharepetitionmechanismsimilartoTCPbutsimpler,window=1packetIntheendsystems(transportlayerbyTCPorapplicationlayerifUDPisused).

64

Conclusion

ThetransportlayerinTCP/IPexistsintwoflavoursreliableandstreamoriented:TCPunreliableandmessagebased:UDP

Othertransportlayerprotocolsexistbuttheiruseismarginalex:STCP(reliable+messagebased)

TCPusesslidingwindowandselectiverepeatwindowflowcontrolcongestioncontrol– seelater

65