reliable data transfer in transmission control protocol (tcp)
Post on 17-Jan-2016
223 Views
Preview:
TRANSCRIPT
Reliable Data Transfer inTransmission Control Protocol
(TCP)
TCP Data Transfer
TCP creates reliable, ordered data transfer service on top of IP’s unreliable service
Pipelined segments Cumulative ACKs Uses single
retransmission timer
Retransmissions are triggered by: timeout events duplicate ACKs
Window size controlled by receiver and inferred from network Flow control Congestion control
ReceiverReceiverSenderSender
Sender/Receiver Overview
… …
Sent & Acked Sent Not Acked
OK to Send Not Usable
… …
Last frame acceptable
Receiver window
Last ACK received Last Frame Sent
Received & Acked Acceptable Packet
Not Usable
Sender window
Next frame expected
TCP Sender/Receiver Invariants
Sending application
LastByteWritten
TCP
LastByteSentLastByteAcked
Receiving application
LastByteRead
TCP
LastByteRcvd
NextByteExpected
Snd: LastByteAcked ≤ LastByteSent ≤ LastByteWrittenRcv: LastByteRead < NextByteExpected ≤ LastByteRcvd + 1 LastByteAcked < NextByteExpected ≤ LastByteSent+1
TCP sender (simplified)NextSeqNum = InitialSeqNum + 1 SendBase = InitialSeqNum + 1 /* == LastByteAcked + 1 */
loop (forever) { switch(event) event: data received from application above 1. create TCP segment with sequence number NextSeqNum 2. if (timer currently not running) 2.1. start timer – timeout after TimeOutInterval later 3. pass segment to IP 4. NextSeqNum = NextSeqNum + length(data)
event: timer timeout 1. retransmit not-yet-acknowledged segment with smallest sequence number 2. start timer – timeout after TimeOutInterval later
event: ACK received, with ACK field value of y 1. if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) restart timer } } /* end of loop forever */
Comment:• SendBase-1: last cumulatively ack’ed byte, i.e., the receiver has received bytes up to SendBase –1 and is expecting the byte starting at SendBase
TCP: retransmission scenarios
Host A
Seq=100, 20 bytes data
ACK=100
time premature timeout
Host B
Seq=92, 8 bytes data
ACK=120
Seq=92, 8 bytes data
Seq=
92
tim
eout
ACK=120
Host A
Seq=92, 8 bytes data
ACK=100
loss
tim
eout
lost ACK scenario
Host B
X
Seq=92, 8 bytes data
ACK=100
time
SendBase= 100
SendBase= 120
SendBase= 120
Sendbase= 100
TCP retransmission scenarios (more)
Host A
Seq=92, 8 bytes data
ACK=100
loss
tim
eout
Cumulative ACK scenario
Host B
X
Seq=100, 20 bytes data
ACK=120
time
SendBase= 120
TCP Receiver ACK generation [RFC 1122, RFC 2581]
Event at Receiver
Arrival of in-order segment withexpected seq #. All data up toexpected seq # already ACKed
Arrival of in-order segment withexpected seq #. One other segment has ACK pending
Arrival of out-of-order segmenthigher-than-expect seq. # .Gap detected
Arrival of segment that partially or completely fills gap
TCP Receiver action
Delayed ACK. Wait up to 500msfor next segment. If no next segment,send ACK (ACKs maybe piggybacked)
Immediately send single cumulative ACK, ACKing both in-order segments
Immediately send duplicate ACK, indicating seq. # of next expected byte
Immediate send ACK, provided thatsegment starts at lower end of gap
Fast Retransmit Time-out period
often relatively long: long delay before
resending lost packet
Detect lost segments via duplicate ACKs. Sender often sends
many segments back-to-back
If segment is lost, there will likely be many duplicate ACKs.
If sender receives 3 ACKs for the same data, it supposes that segment after ACKed data was lost: fast retransmit: resend
segment before timer expires
event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { /* y == SendBase. y cannot be smaller than SendBase */ increment count of dup ACKs received for y if (count of dup ACKs received for y == 3) { resend segment with sequence number y == SendBase }
Fast retransmit algorithm:
a duplicate ACK for already ACKed segment
fast retransmit
TCP Flow Control
TCP Flow Control
receive side of TCP connection has a receive buffer:
speed-matching service: matching the send rate to the receiving app’s drain rate app process may be
slow at reading from buffer
sender won’t overflow
receiver’s buffer bytransmitting too
much, too fast
flow control
TCP Flow control: how it works
spare room in buffer
RcvWindow
= MaxRcvBuffer -
[NextByteExpectd - LastByteRead]
Rcvr advertises spare room by including value of RcvWindow in segments
Sender limits unACKed data to RcvWindow guarantees receive buffer
doesn’t overflow
LastByteSent – LastByteAcked ≤ RcvWindow
SndWindow = RcvWindow -[LastByteSent – LastByteAcked]
LastByteRead
LastByteRcvd
NextByteExpected
MaxRcvBuffer
TCP Flow control Issues
What happens if advertised window is 0? Receiver updates window when application
reads data What if this update is lost?
• Deadlock
TCP Persist timer Sender periodically sends window probe
packets Receiver responds with ACK and up-to-date
window advertisement
TCP flow control enhancements
Problem: (Clark, 1982) If receiver advertises small increases in the
receive window then the sender may waste time sending lots of small packets
This problem is known as “Silly window syndrome”
• Receiver advertises one byte window• Sender sends one byte packet (1 byte data, 40 byte
header = 4000% overhead)
Solving Silly Window Syndrome
Receiver avoidance [Clark (1982)] Prevent receiver from advertising small windows Increase advertised receiver window by
min(MSS, RecvBuffer/2)
Solving Silly Window Syndrome
Sender Avoidance [Nagle’s algorithm (1984)] prevent sender from unnecessarily sending small
packets How long does sender delay sending data?
too long: hurts interactive applications too short: poor network utilization strategies: timer-based vs self-clocking
When application generates additional data if fills a max segment (and window open): send it else
• if there is unack’ed data in transit: buffer it until ACK arrives
• else: send it
Keeping the Pipe Full 16-bit AdvertisedWindow
Bandwidth Delay x Bandwidth ProductT1 (1.5 Mbps) 18KBEthernet (10 Mbps) 122KBT3 (45 Mbps) 549KBFDDI (100 Mbps) 1.2MBSTS-3 (155 Mbps) 1.8MBSTS-12 (622 Mbps) 7.4MBSTS-24 (1.2 Gbps) 14.8MB
assuming 100ms RTT
TCP Extension to allow window scaling Put is options field
TCP Round Trip Time Estimation and Setting the Timeout
TCP Round Trip Time and Timeout
Q: how to set TCP timeout value?
longer than RTT but RTT varies
too short: premature timeout unnecessary
retransmissions too long: slow
reaction to segment loss
Q: how to estimate RTT? SampleRTT: measured time
from segment transmission until ACK receipt ignore retransmissions
SampleRTT will vary, want estimated RTT “smoother” average several recent
measurements, not just current SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
Exponential weighted moving average influence of past sample decreases exponentially
fast typical value: = 0.125
e.g: Ai Estimated RTT at time i and M sampled RTT at time i
A0 = M0
A1 = (1- ) M0 + M1
A2 = (1- )2 M0 + (1- ) M1 + M2
….
Example RTT estimation:RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
100
150
200
250
300
350
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
time (seconnds)
RTT
(mill
isec
onds
)
SampleRTT Estimated RTT
TCP Round Trip Time and Timeout
Setting the timeout EstimtedRTT plus “safety margin”
large variation in EstimatedRTT -> larger safety margin first estimate of how much SampleRTT deviates from
EstimatedRTT:
TimeoutInterval = EstimatedRTT + 4*DevRTT
DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT|
(typically, = 0.25)
Then set timeout interval:
TCP Congestion Control
Congestion Control Overview
Congestion: informally: “too many sources sending too much data too fast for network to handle”
Signal and detect congestion Policy for source to adjust transmission rate to
match network bandwidth capacity Decrease rate upon congestion signal Increase for utilization
Initialization to reach steady state
Network Utilization
Queuing delay (theoretically) could approach infinity with increased load
Network Power (ratio of throughput to delay)
Optimalload Load
Th
rou
ghp
ut/d
elay
Knee
Queuing delay
Approaches towards congestion control
End-end congestion control:
no explicit feedback from network
congestion inferred from end-system observed loss, delay
approach taken by TCP
Network-assisted congestion control:
routers provide feedback to end systems single bit indicating
congestion (SNA, DECbit, TCP/IP ECN, ATM)
explicit rate sender should send at
Two broad approaches towards congestion control:
TCP Congestion Control
end-end control (no network assistance)
sender limits transmission: sndWindow
= LastByteSent-LastByteAcked
min(cwnd, rcvWin) Cwnd is a dynamic function of
perceived network congestion
How does sender perceive congestion?
loss event = timeout or 3 duplicate acks
TCP sender reduces rate (cwnd) after loss event
TCP Congestion Control Outline Basic Idea: Probe (test) the current
available bandwidth in the network & adjust your sending rate accordingly
1. Start with a very small sending rate -- 1 packet2. Increase your sending rate as long as no
congestion is detected• How do you detect “no congestion”? – No packet loss• How do you increase your sending rate?
3. Decrease your sending rate when you detect congestion• How do you detect congestion? – Packet loss
(timeout, 3 duplicate ACK)• How do you decrease your sending rate?
TCP Slow Start When connection begins, cwnd = 1 MSS
Example: MSS = 500 bytes & RTT = 200 msec initial rate = 20 kbps
available bandwidth may be >> MSS/RTT desirable to quickly ramp up to respectable rate
When connection begins, increase rate exponentially fast until first loss event
When loss occurs (congestion signal), set cwnd to 1 MSS and re-start with slow start
TCP Slow Start (more) When connection begins,
increase rate exponentially until first loss event: double cwnd every RTT done by incrementing cwnd by 1
MSS for every ACK received
Summary: initial rate is slow but ramps up
exponentially fast When the first packet loss occurs
(either a timeout occurred or 3 duplicate ACKs were received), set cwnd to 1 MSS and start over.
Host A
one segment
RTT
Host B
time
two segments
four segments
TCP After the First Packet LossQuestion: Should we continue doing slow start
throughout the lifetime of the TCP connection? Why did we increase cwnd exponentially at the beginning?
• Because we had no idea how much of our traffic the network can carry, so we needed to probe fast to figure it out
What do we know at the first packet loss event?• That the network cannot carry our traffic at the rate we had
at the time the packet loss occurred– What was our rate at the time the packet loss occurred?
» Cwnd/RTT
Refinement Idea: Use the knowledge we obtained at the time of the packet loss for further refinement of the slow start algorithm. How? Keep a threshold value, ssthresh, and set to 1/2 of cwnd
just before loss event. Cnwd increases exponentially until it reaches ssthresh,
and linearly afterwards – this is called congestion avoidance
Congestion Avoidance – TCP Tahoe
Q: When should the exponential increase switch to linear?
A: When cwnd gets to 1/2 of its value before timeout.
Implementation: Threshold variable ssthresh Set to some large value, i.e.,
65K, when the connection is established
At loss event, ssthresh is set to 1/2 of cwnd just before loss event
0
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Transmission round
co
ng
es
tio
n w
ind
ow
siz
e
(se
gm
en
ts)
threshold
TCP Tahoe
if (cwnd < ssthresh) cwnd +=1;else cwnd += 1/cwnd;
Figure assumes that the first packet loss has occurred when cwnd was 16, so sshtresh = 8, and cwnd = 1
Linear Window Increase Example
During Congestion Avoidance
Linear Congestion Window Increase during Congestion Avoidance Phase Window is increased by 1
packet after a full window size of packets is ACKed
Host A
one segment
RTT
Host B
time
two segments
three segments
four segments
Towards a better Congestion Control Algorithm – TCP Reno
Question: Should we set cwnd to 1 both after After a timeout When 3 duplicate ACKs are received
Answer: timeout before 3 dup ACKs is “alarming”.
• So set cwnd to 1 BUT 3 dup ACKs before timeout indicates that the network is
capable of delivering some segments• Why not set cwnd to a bigger value rather than 1?• TCP Reno: Set cwnd to half of its value, which is equal to
sshtresh, and enter linear increase phase
TCP Reno Refinement (more)
0
2
4
6
8
10
12
14
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Transmission round
co
ng
es
tio
n w
ind
ow
siz
e
(se
gm
en
ts)
threshold
TCP Tahoe
TCP Reno
RCP Reno Refinement After 3 dup ACKs:
cwnd is cut in half window then
grows linearly This is called “fast
recovery”
But after timeout event: cwnd instead set
to 1 MSS; window then
grows exponentially
to a threshold, then grows linearly
At time 8, when cwnd = 12, a packet loss is detected by 3 duplicate ACKs.
Tahoe sets cwnd to 1 unconditionally
Reno sets cwnd to half of its current value, which is 6
Notice that ssthresh is set to 6 in both cases
Also notice that if this was a “timeout”, then Reno would also set cwnd to 1
Summary: TCP Congestion Control
When cwnd is below ssthresh, sender in slow-start phase, window grows exponentially.
When cwnd is above sshthresh, sender is in congestion-avoidance phase, window grows linearly.
When a triple duplicate ACK occurs, ssthresh set to cwnd/2 and cwnd set to ssthresh.
When timeout occurs, ssthresh set to cwnd/2 and cwnd is set to 1 MSS.
Congestion window always oscillates !
Steady State TCP Modeling
8 Kbytes
16 Kbytes
24 Kbytes
time
congestionwindow
Long-lived TCP connection
Figure above shows the behavior of a TCP connection in steady state Additive Increase/Multiplicative Decrease (AIMD)
Further improving TCP Congestion Control Algorithm --TCP Vegas Detect congestion before packet loss occurs by observing RTTs
• Longer RTT, greater the congestion in the routers Lower the transmission rate linearly when packet loss is
imminent
top related