transport layer3-1 transport layer outline r 3.1 transport-layer services r 3.2 multiplexing and...
TRANSCRIPT
Transport Layer 3-1
Transport Layer Outline
3.1 Transport-layer services
3.2 Multiplexing and demultiplexing
3.3 Connectionless transport: UDP
3.4 Principles of reliable data transfer
3.5 Connection-oriented transport: TCP segment structure reliable data transfer flow control connection
management
3.6 Principles of congestion control
3.7 TCP congestion control
Transport Layer 3-2
Recap: rdt3.0 sender (Stop-and-wait)
sndpkt = make_pkt(0, data, checksum)udt_send(sndpkt)start_timer
rdt_send(data)
Wait for
ACK0
rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) ||isACK(rcvpkt,1) )
Wait for call 1 from
above
sndpkt = make_pkt(1, data, checksum)udt_send(sndpkt)start_timer
rdt_send(data)
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,0)
rdt_rcv(rcvpkt) && ( corrupt(rcvpkt) ||isACK(rcvpkt,0) )
rdt_rcv(rcvpkt) && notcorrupt(rcvpkt) && isACK(rcvpkt,1)
stop_timerstop_timer
udt_send(sndpkt)start_timer
timeout
udt_send(sndpkt)start_timer
timeout
rdt_rcv(rcvpkt)
Wait for call 0from
above
Wait for
ACK1
rdt_rcv(rcvpkt)
Transport Layer 3-3
Recap: rdt3.0: stop&wait op
first packet bit transmitted, t = 0
sender receiver
RTT
last packet bit transmitted, t = L / R
first packet bit arriveslast packet bit arrives, send ACK
ACK arrives, send next packet, t = RTT + L / R
U sender
= .008
30.008 = 0.00027
microseconds
L / R
RTT + L / R =
Transport Layer 3-4
Recap: Pipelining: increased utilization
first packet bit transmitted, t = 0
sender receiver
RTT
last bit transmitted, t = L / R
first packet bit arriveslast packet bit arrives, send ACK
ACK arrives, send next packet, t = RTT + L / R
last bit of 2nd packet arrives, send ACKlast bit of 3rd packet arrives, send ACK
U sender
= .024
30.008 = 0.0008
microseconds
3 * L / R
RTT + L / R =
Increase utilizationby a factor of 3!
Transport Layer 3-5
Recap: GBN for Pipelined Error RecoverySender: There is a k-bit sequence # in packet header “window” of up to N, consecutive unacknowledged sent/can-be-sent packets allowed window moves by 1 packet at a time when its 1st sent pkt is acknowledged (standard behavior)
Sender must respond to three types of events: 1- Invocation from above: application layers tries to send a packet, if window is full
then packet is returned otherwise the packet is accepted and sent. 2- Receipt of an ACK: One ACK(n) received indicates that all pkts up to, including seq
# n have been received - “cumulative ACK” may receive duplicate ACKs (when receiver receives out-of-order packets)
3- A timeout event (only cause of retransmission): timer for each in-flight pkt. if timeout occurs: retransmit packets that have not been acknowledged.
window cannot contain acknowledged pkts
Transport Layer 3-6
Recap: Selective repeat for error recovery
Window may contain acknowledged pkts (unlike GBN)
Transport Layer 3-7
TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581
full duplex data: bi-directional data flow in same
connection at the same time
flow controlled: sender will not overwhelm
receiver
point-to-point: one sender, one receiver no one to many multicasts
connection-oriented: processes must handshake before
sending data three-way handshake: (exchange
of control msgs) initializes sender, receiver state before data exchange
pipelined: TCP congestion and flow control
set window size
send & receive buffers: set-aside during the 3-way
handshaking
socketdoor
T C Psend buffer
T C Preceive buffer
socketdoor
segm en t
applicationwrites data
applicationreads data
Transport Layer 3-8
TCP: Overview - cont Maximum Segment Size (MSS):
Defined as the maximum amount of application-layer data in the TCP segment.
TCP grabs data in chunks from the send buffer where the maximum chunk size is called MSS. TCP segment contains TCP header and MSS.
MSS is set by determining the largest link layer frame (Maximum Transmission Unit or MTU) that can be sent by the local host
MSS is set so that an MSS put into an IP datagram will fit into a single link layer frame. Common values of MTU is 1460 bytes, 536 bytes and 512 bytes.
TCP sequence #s: both sides randomly choose initial seq #s (other than 0) to prevent
receiving segments of older connections that were using the same ports. TCP views data as unordered structured stream of bytes so seq #s are over
the stream of byes. file size of 500,000 bytes and MSS=1,000 bytes, segment seq #s are: 0,
1000, 2000, etc. TCP acknowledgement #s:
uses cumulative acks: TCP only acks bytes up to the first missing byte in the stream . TCP RFCs do not address how to handle out-of-order segments.
ACK # field has the next byte offset that the sender or receiver is expecting
Transport Layer 3-9
TCP segment structure
source port # dest port #
32 bits
applicationdata
(variable length)
sequence number
acknowledgement numberReceive window
Urgent data pointerchecksum
FSRPAUheaderlength
notused
Options (variable length)used to negotiate MSS
URG: urgent data (generally not used)
ACK: ACK #valid
PSH: push data nowto upper layer
SYN/FIN: connection setup and close.
RST=1: used in responsewhen client
tries to connect to a non-open server port .
16-bit= # bytes receiver willingto accept (RcvWindow size)
counting by bytes of data (not segments!)largest file that can be sent = 232 (4GB)total #segments= filesize/MSS
Internetchecksum
(as in UDP)
header-length = 4-bitsin 32-bit words
Transport Layer 3-10
Seq Numbers and Ack Numbers Suppose a data stream of size 500,000 bytes,
MSS is 1,000 bytes; the first byte of the data stream is numbered zero. Seq number of the segments:
• 1st seg: 0; 2nd seg: 1000; 3rd seg: 2000, …
Ack number: Assume host A is sending seg to host B. Because TCP
is full-duplex, A may be receiving data from B simultaneously.
Ack number that host B puts in its seg is the seq number of the next byte B is expecting from A
• B has received all bytes numbered 0 through 535 from A. If B is about to send a segment to host A. The ack number in its segment should 536
Transport Layer 3-11
TCP seq. #’s and ACKs - Telnet example
Telnet uses “echo back” to ensure characters seen by user already been received and processed at server.
Assume starting seq #s are 42 and 79 for client and server respectively.
After connection is established, client is waiting for byte 79 and server for byte 42.
Seq. #’s: byte stream “number” of
first byte in segment’s data
ACKs: seq # of next byte
expected from other side cumulative ACK
Host Aclient
Host Bserver
Seq=42, ACK=79, data = ‘C’
Seq=79, ACK=43, data = ‘C’
Seq=43, ACK=80
Usertypes
‘C’
host ACKsreceipt
of echoed‘C’
host ACKsreceipt of
‘C’, echoesback ‘C’
timesimple telnet scenario
Transport Layer 3-12
TCP Round Trip Time and TimeoutQ: how to set TCP
timeout value ? (timer management)
based on RTT longer than RTT
but RTT varies too short: premature
timeout unnecessary
retransmissions too long: slow reaction
to segment loss
Q: how to estimate RTT? SampleRTT: measured time from
segment transmission (handing the segment to IP) until ACK receipt ignore retransmissions (why?)
SampleRTT will vary from segment to segment, want estimated RTT “smoother” average several recent
measurements, not just current SampleRTT
TCP maintains an average called EstimatedRTT to use it to calculate the timeout value
Transport Layer 3-13
TCP Round Trip Time (RTT) and Timeout
EstimatedRTT = (1- ) * priorEstimatedRTT + * currentSampleRTT
Exponential Weighted Moving Average (EWMA) Puts more weight on recent samples rather than old ones influence of past sample decreases exponentially fast typical value: = 0.125 Formula becomes:
EstimatedRTT = 0.875 * priorEstimatedRTT + 0.125 * currentSampleRTT
Why TCP ignores retransmissions when calculating SampleRTT:Suppose source sends packet P1, the timer for P1 expires, and the source then sends P2, a new copy of the same packet. Further suppose the source measures SampleRTT for P2 (the retransmitted packet) and that shortly after transmitting P2 an acknowledgment for P1 arrives. The source will mistakenly take this acknowledgment as an acknowledgment for P2 and calculate an incorrect value of SampleRTT.
Transport Layer 3-14
RTT Sample Ambiguity
Karn’s RTT Estimator If a segment has been retransmitted:
• Don’t count RTT sample on ACKs for this segment• Keep backed off time-out for next packet• Reuse RTT estimate only after one successful transmission
A B
ACK
SampleRTT
Original transmission
retransmission
Estimate RTT
A B
Original transmission
retransmissionSampleRTT
ACKeRTTX
Transport Layer 3-15
Example RTT estimation:RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
Transport Layer 3-16
TCP Round Trip Time and TimeoutSetting the timeout EstimtedRTT plus “safety margin”
large variation in EstimatedRTT -> larger safety margin first estimate of how much SampleRTT deviates from EstimatedRTT:
TimeoutInterval = EstimatedRTT + 4*DevRTT
DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT|
(typically, = 0.25)
Then set timeout interval:
Transport Layer 3-17
TCP: conn-oriented transport segment structure RTT Estimation and Timeout reliable data transfer flow control connection management
Transport Layer 3-18
TCP reliable data transfer
TCP creates rdt service on top of IP’s unreliable service
Pipelined segments Cumulative acks TCP uses single
retransmission timer as multiple timers require considerable overhead
Retransmissions are triggered by: timeout events duplicate acks
Initially consider simplified TCP sender: ignore duplicate acks ignore flow control,
congestion control
Transport Layer 3-19
TCP sender events:data rcvd from app: Create segment with seq
# seq # is byte-stream
number of first data byte in segment
start timer if not already running for some other segment (think of timer as for oldest unacknowledged segment)
expiration interval: TimeOutInterval
timeout: retransmit segment that
caused timeout restart timer Ack rcvd: a valid ACK field
(cumulative ACK) acknowledges previously unacknowledged segments: update expected ACK # restart timer if there are
currently unacknowledged segments
Transport Layer 3-20
TCP sender(simplified)
NextSeqNum = InitialSeqNum SendBase = InitialSeqNum
loop (forever) { switch(event)
event: data received from application above create TCP segment with sequence number NextSeqNum if (timer currently not running) start timer pass segment to IP NextSeqNum = NextSeqNum + length(data)
event: timer timeout retransmit not-yet-acknowledged segment with smallest sequence number start timer
event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer }
} /* end of loop forever */
Comment:• SendBase-1: last cumulatively ack’ed byteExample:• SendBase-1 = 71;y= 73, so the rcvrwants 73+ ;y > SendBase, sothat new data is acked
Transport Layer 3-21
TCP: retransmission scenarios
Host A
Seq=100, 20 bytes data
ACK=100
timepremature timeout
Host B
Seq=92, 8 bytes data
ACK=120
Seq=92, 8 bytes data
Seq=
92
tim
eout
ACK=120
Host A
Seq=92, 8 bytes data
ACK=100
loss
tim
eout
lost ACK scenario
Host B
X
Seq=92, 8 bytes data
ACK=100
time
Seq=
92
tim
eout
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
transmit not-yet-ack segment with smallest seq #
Transport Layer 3-22
TCP retransmission scenarios (more)
Host A
Seq=92, 8 bytes data
ACK=100
loss
tim
eout
Cumulative ACK scenario
Host B
X
Seq=100, 20 bytes data
ACK=120
time
SendBase= 120
Doubling the timeout value technique is used in TCP implementations. The timeout value is doubled for every retransmission since the timeout could have occurred because the network is congested. (the intervals grow exponentially after each retransmission and reset after either of the two other events)
Transport Layer 3-23
TCP ACK generation policy [RFC 1122, RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq #. All data up toexpected seq # already ACKed
Arrival of in-order segment withexpected seq #. One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq. # .Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK. Wait up to 500msfor next segment. If no next segment,send ACK
Immediately send single cumulative ACK, ACKing both in-order segments
Immediately send duplicate ACK, indicating seq. # of next expected byte
Immediate send ACK, provided thatsegment starts at lower end of gap
leaves buffering of out-of-order segments open
Transport Layer 3-24
Fast Retransmit
Time-out period often relatively long: long delay before
resending lost packet Detect lost segments via
duplicate ACKs. Dup Ack is an ack that
reaknolwedges the receipt of an acknowledged segment
Sender often sends many segments back-to-back
If segment is lost, there will likely be many duplicate ACKs.
If sender receives 3 ACKs for the same data, it supposes that segment after last ACKed segment was lost: sender performs fast
retransmit: resend segment before that segment’s timer expires
algorithm comes as a result of 15 years TCP experience !
Transport Layer 3-25
event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y }
Fast retransmit algorithm:
a duplicate ACK for already ACKed segment
fast retransmit
Transport Layer 3-26
Is TCP a GBN or SR protocol ?
TCP can buffer out-of-order segments (like SR). TCP has a proposed RFC called selective
acknowledgement to selectively acknowledge out-of-order segments and save on retransmissions (like SR).
TCP sender need only maintain smallest seq # of a transmitted but unacknowledged byte and the seq # of next byte to be sent (like GBN).
TCP is hybrid between GBN and SR.
Transport Layer 3-27
TCP: conn-oriented transport segment structure RTT Estimation and Timeout reliable data transfer flow control connection management
Transport Layer 3-28
TCP Flow Control
receive side of TCP connection has a receive buffer:
speed-matching service: matching the send rate to the receiving app’s drain rate app process may be
slow at reading from buffer
sender won’t overflow
receiver’s buffer bytransmitting too
much, too fast
flow control
Transport Layer 3-29
TCP Flow control: how it works
(Suppose TCP receiver discards out-of-order segments)
sender maintains variable called receive window
spare room in buffer = RcvWindow= RcvBuffer-[LastByteRcvd -
LastByteRead] TCP is not allowed to overflow the
allocated buffer (LastByteRcvd - LastByteRead <= RcvBuffer)
Rcvr advertises spare room by including value of RcvWindow in segments
RcvWindow = RcvBuffer at the start of transmission
Sender limits unACKed data to RcvWindow
sender keeps track of UnAcked data size = (LastByteSent - LastByteAcked)
UnAcked data size <= RcvWindow
When Receiver RcvWindow = 0, Sender does not block but rather sends 1 byte segments that are acked by receiver until RcvWindow becomes bigger.
Transport Layer 3-30
TCP: conn-oriented transport segment structure RTT Estimation and Timeout reliable data transfer flow control connection management
Transport Layer 3-31
Recap: TCP socket interaction
Server (running on hostid)
wait for incomingconnection requestconnectionSocket =welcomeSocket.accept()
create socket,port=x, forincoming request:welcomeSocket =
ServerSocket()
create socket,connect to hostid, port=xclientSocket =
Socket()
closeconnectionSocket
read reply fromclientSocket
closeclientSocket
Client
send request usingclientSocketread request from
connectionSocket
write reply toconnectionSocket
TCP connection setup
Transport Layer 3-32
TCP Connection Management
Recall: TCP sender, receiver establish “connection” before exchanging data segments
initialize TCP variables: seq. #s buffers, flow control info
(e.g. RcvWindow) client: connection initiator Socket clientSocket = new
Socket("hostname","port
number"); server: contacted by client Socket connectionSocket =
welcomeSocket.accept();
source port # dest port #
32 bits
applicationdata
(variable length)
sequence number
acknowledgement numberReceive window
Urgent data pointerchecksum
FSRPAUheaderlength
notused
Options (variable length)used to negotiate MSS
Transport Layer 3-33
TCP Connection Management - connecting
client
SYN=1, seq=client_isn
server
SYN=1, seq=server_isn,
ack=client_isn+1
SYN=0, seq=client_isn+1, ack=server_isn+1
connrequest
Time
conngranted
ACK
Time
Three way handshake: Step 1: client host sends
TCP SYN segment (SYN bit=1) to server
• specifies initial seq # (client_isn)
• no data Step 2: server host receives
SYN, replies with SYNACK segment
• server allocates buffers• specifies server initial seq.
# (server_isn), with ACK # = client_isn+1
Step 3: client receives SYNACK, replies with ACK # = server_isn+1, which may contain data
Transport Layer 3-34
TCP Connection Setup Example
Client SYN SeqC: Seq. #4019802004, window 65535, max. seg. 1260
Server SYN-ACK+SYN Receive: #4019802005 (= SeqC+1) SeqS: Seq. #3428951569, window 5840, max. seg. 1460
Client SYN-ACK Receive: #3428951570 (= SeqS+1)
09:23:33.042318 IP 128.2.222.198.3123 > 192.216.219.96.80: S 4019802004:4019802004(0) win 65535 <mss 1260,nop,nop,sackOK>
09:23:33.118329 IP 192.216.219.96.80 > 128.2.222.198.3123: S 3428951569:3428951569(0) ack 4019802005 win 5840 <mss 1460,nop,nop,sackOK>
09:23:33.118405 IP 128.2.222.198.3123 > 192.216.219.96.80: . ack 3428951570 win 65535
sackOK: selective acknowledge
Transport Layer 3-35
TCP Connection Management - disconnecting
Closing a connection:
client closes socket: clientSocket.close();
Step 1: client end system sends TCP FIN control segment (FIN bit=1) to server
Step 2: server receives FIN, replies with ACK. Closes connection, sends FIN=1.
client
FIN
server
ACK
ACK
FIN
close
close
closed
tim
ed w
ait
Transport Layer 3-36
TCP Connection Management (cont.)
Step 3: client receives FIN, replies with ACK.
Enters “timed wait” - will respond with ACK to received FINs where typical wait is 30 sec. All resources and ports are released.
Step 4: server, receives ACK. Connection closed.
client
FIN
server
ACK
ACK
FIN
closing
closing
closed
tim
ed w
ait
closed
Transport Layer 3-37
TCP Conn.Teardown Example
Session Echo client on 128.2.222.198, server on 128.2.210.194
Client FIN SeqC: 1489294581
Server ACK + FIN Ack: 1489294582 (= SeqC+1) SeqS: 1909787689
Client ACK Ack: 1909787690 (= SeqS+1)
09:54:17.585396 IP 128.2.222.198.4474 > 128.2.210.194.6616: F 1489294581:1489294581(0) ack 1909787689 win 65434
09:54:17.585732 IP 128.2.210.194.6616 > 128.2.222.198.4474: F 1909787689:1909787689(0) ack 1489294582 win 5840
09:54:17.585764 IP 128.2.222.198.4474 > 128.2.210.194.6616: . ack 1909787690 win 65434
Transport Layer 3-40
Concurrent Server(1) pid_t pid; (2) int listenfd, connfd;(3) listenfd = Socket( ... );
(4) /* fill in sockaddr_in{} with server's well-known port */ (5) Bind(listenfd, ... ); (6) Listen(listenfd, LISTENQ);
(7) for ( ; ; ) {(8) connfd = Accept (listenfd, ... ); /* probably blocks */ (9) if( (pid = Fork()) == 0) { (10) Close(listenfd); /* child closes listening socket */ (11) doit(connfd); /* process the request */(12) Close(connfd); /* done with this client */ (13) exit(0); /* child terminates */ (14) } (15) Close(connfd); /* parent closes connected socket */ (16) }
Transport Layer 3-41
Concurrent Server (Cont’)
(a) Status before call to call to accept returns
(b) status after return from accept
(c) Status after return of spawning a process
(d) Status after parent/child close appropriate sockets
Transport Layer 3-42
TCP Summary TCP Properties:
point to point, connection-oriented, full-duplex, reliable
TCP Segment Structure How TCP sequence and acknowledgement #s are
assigned How does TCP measure the timeout value needed for
retransmissions using EstimatedRTT and DevRTT TCP retransmission scenarios, ACK generation and
fast retransmit How does TCP Flow Control work TCP Connection Management: 3-segments exchanged
to connect and 4-segments exchanged to disconnect