reliable data transfer in transmission control protocol (tcp)

38
Reliable Data Transfer in Transmission Control Protocol (TCP)

Upload: gavin-cole

Post on 17-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Reliable Data Transfer in Transmission Control Protocol (TCP)

Reliable Data Transfer inTransmission Control Protocol

(TCP)

Page 2: Reliable Data Transfer in Transmission Control Protocol (TCP)

TCP Data Transfer

TCP creates reliable, ordered data transfer service on top of IP’s unreliable service

Pipelined segments Cumulative ACKs Uses single

retransmission timer

Retransmissions are triggered by: timeout events duplicate ACKs

Window size controlled by receiver and inferred from network Flow control Congestion control

Page 3: Reliable Data Transfer in Transmission Control Protocol (TCP)

ReceiverReceiverSenderSender

Sender/Receiver Overview

… …

Sent & Acked Sent Not Acked

OK to Send Not Usable

… …

Last frame acceptable

Receiver window

Last ACK received Last Frame Sent

Received & Acked Acceptable Packet

Not Usable

Sender window

Next frame expected

Page 4: Reliable Data Transfer in Transmission Control Protocol (TCP)

TCP Sender/Receiver Invariants

Sending application

LastByteWritten

TCP

LastByteSentLastByteAcked

Receiving application

LastByteRead

TCP

LastByteRcvd

NextByteExpected

Snd: LastByteAcked ≤ LastByteSent ≤ LastByteWrittenRcv: LastByteRead < NextByteExpected ≤ LastByteRcvd + 1 LastByteAcked < NextByteExpected ≤ LastByteSent+1

Page 5: Reliable Data Transfer in Transmission Control Protocol (TCP)

TCP sender (simplified)NextSeqNum = InitialSeqNum + 1 SendBase = InitialSeqNum + 1 /* == LastByteAcked + 1 */

loop (forever) { switch(event) event: data received from application above 1. create TCP segment with sequence number NextSeqNum 2. if (timer currently not running) 2.1. start timer – timeout after TimeOutInterval later 3. pass segment to IP 4. NextSeqNum = NextSeqNum + length(data)

event: timer timeout 1. retransmit not-yet-acknowledged segment with smallest sequence number 2. start timer – timeout after TimeOutInterval later

event: ACK received, with ACK field value of y 1. if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) restart timer } } /* end of loop forever */

Comment:• SendBase-1: last cumulatively ack’ed byte, i.e., the receiver has received bytes up to SendBase –1 and is expecting the byte starting at SendBase

Page 6: Reliable Data Transfer in Transmission Control Protocol (TCP)

TCP: retransmission scenarios

Host A

Seq=100, 20 bytes data

ACK=100

time premature timeout

Host B

Seq=92, 8 bytes data

ACK=120

Seq=92, 8 bytes data

Seq=

92

tim

eout

ACK=120

Host A

Seq=92, 8 bytes data

ACK=100

loss

tim

eout

lost ACK scenario

Host B

X

Seq=92, 8 bytes data

ACK=100

time

SendBase= 100

SendBase= 120

SendBase= 120

Sendbase= 100

Page 7: Reliable Data Transfer in Transmission Control Protocol (TCP)

TCP retransmission scenarios (more)

Host A

Seq=92, 8 bytes data

ACK=100

loss

tim

eout

Cumulative ACK scenario

Host B

X

Seq=100, 20 bytes data

ACK=120

time

SendBase= 120

Page 8: Reliable Data Transfer in Transmission Control Protocol (TCP)

TCP Receiver ACK generation [RFC 1122, RFC 2581]

Event at Receiver

Arrival of in-order segment withexpected seq #. All data up toexpected seq # already ACKed

Arrival of in-order segment withexpected seq #. One other segment has ACK pending

Arrival of out-of-order segmenthigher-than-expect seq. # .Gap detected

Arrival of segment that partially or completely fills gap

TCP Receiver action

Delayed ACK. Wait up to 500msfor next segment. If no next segment,send ACK (ACKs maybe piggybacked)

Immediately send single cumulative ACK, ACKing both in-order segments

Immediately send duplicate ACK, indicating seq. # of next expected byte

Immediate send ACK, provided thatsegment starts at lower end of gap

Page 9: Reliable Data Transfer in Transmission Control Protocol (TCP)

Fast Retransmit Time-out period

often relatively long: long delay before

resending lost packet

Detect lost segments via duplicate ACKs. Sender often sends

many segments back-to-back

If segment is lost, there will likely be many duplicate ACKs.

If sender receives 3 ACKs for the same data, it supposes that segment after ACKed data was lost: fast retransmit: resend

segment before timer expires

Page 10: Reliable Data Transfer in Transmission Control Protocol (TCP)

event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { /* y == SendBase. y cannot be smaller than SendBase */ increment count of dup ACKs received for y if (count of dup ACKs received for y == 3) { resend segment with sequence number y == SendBase }

Fast retransmit algorithm:

a duplicate ACK for already ACKed segment

fast retransmit

Page 11: Reliable Data Transfer in Transmission Control Protocol (TCP)

TCP Flow Control

Page 12: Reliable Data Transfer in Transmission Control Protocol (TCP)

TCP Flow Control

receive side of TCP connection has a receive buffer:

speed-matching service: matching the send rate to the receiving app’s drain rate app process may be

slow at reading from buffer

sender won’t overflow

receiver’s buffer bytransmitting too

much, too fast

flow control

Page 13: Reliable Data Transfer in Transmission Control Protocol (TCP)

TCP Flow control: how it works

spare room in buffer

RcvWindow

= MaxRcvBuffer -

[NextByteExpectd - LastByteRead]

Rcvr advertises spare room by including value of RcvWindow in segments

Sender limits unACKed data to RcvWindow guarantees receive buffer

doesn’t overflow

LastByteSent – LastByteAcked ≤ RcvWindow

SndWindow = RcvWindow -[LastByteSent – LastByteAcked]

LastByteRead

LastByteRcvd

NextByteExpected

MaxRcvBuffer

Page 14: Reliable Data Transfer in Transmission Control Protocol (TCP)

TCP Flow control Issues

What happens if advertised window is 0? Receiver updates window when application

reads data What if this update is lost?

• Deadlock

TCP Persist timer Sender periodically sends window probe

packets Receiver responds with ACK and up-to-date

window advertisement

Page 15: Reliable Data Transfer in Transmission Control Protocol (TCP)

TCP flow control enhancements

Problem: (Clark, 1982) If receiver advertises small increases in the

receive window then the sender may waste time sending lots of small packets

This problem is known as “Silly window syndrome”

• Receiver advertises one byte window• Sender sends one byte packet (1 byte data, 40 byte

header = 4000% overhead)

Page 16: Reliable Data Transfer in Transmission Control Protocol (TCP)

Solving Silly Window Syndrome

Receiver avoidance [Clark (1982)] Prevent receiver from advertising small windows Increase advertised receiver window by

min(MSS, RecvBuffer/2)

Page 17: Reliable Data Transfer in Transmission Control Protocol (TCP)

Solving Silly Window Syndrome

Sender Avoidance [Nagle’s algorithm (1984)] prevent sender from unnecessarily sending small

packets How long does sender delay sending data?

too long: hurts interactive applications too short: poor network utilization strategies: timer-based vs self-clocking

When application generates additional data if fills a max segment (and window open): send it else

• if there is unack’ed data in transit: buffer it until ACK arrives

• else: send it

Page 18: Reliable Data Transfer in Transmission Control Protocol (TCP)

Keeping the Pipe Full 16-bit AdvertisedWindow

Bandwidth Delay x Bandwidth ProductT1 (1.5 Mbps) 18KBEthernet (10 Mbps) 122KBT3 (45 Mbps) 549KBFDDI (100 Mbps) 1.2MBSTS-3 (155 Mbps) 1.8MBSTS-12 (622 Mbps) 7.4MBSTS-24 (1.2 Gbps) 14.8MB

assuming 100ms RTT

TCP Extension to allow window scaling Put is options field

Page 19: Reliable Data Transfer in Transmission Control Protocol (TCP)

TCP Round Trip Time Estimation and Setting the Timeout

Page 20: Reliable Data Transfer in Transmission Control Protocol (TCP)

TCP Round Trip Time and Timeout

Q: how to set TCP timeout value?

longer than RTT but RTT varies

too short: premature timeout unnecessary

retransmissions too long: slow

reaction to segment loss

Q: how to estimate RTT? SampleRTT: measured time

from segment transmission until ACK receipt ignore retransmissions

SampleRTT will vary, want estimated RTT “smoother” average several recent

measurements, not just current SampleRTT

Page 21: Reliable Data Transfer in Transmission Control Protocol (TCP)

TCP Round Trip Time and Timeout

EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT

Exponential weighted moving average influence of past sample decreases exponentially

fast typical value: = 0.125

e.g: Ai Estimated RTT at time i and M sampled RTT at time i

A0 = M0

A1 = (1- ) M0 + M1

A2 = (1- )2 M0 + (1- ) M1 + M2

….

Page 22: Reliable Data Transfer in Transmission Control Protocol (TCP)

Example RTT estimation:RTT: gaia.cs.umass.edu to fantasia.eurecom.fr

100

150

200

250

300

350

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

time (seconnds)

RTT

(mill

isec

onds

)

SampleRTT Estimated RTT

Page 23: Reliable Data Transfer in Transmission Control Protocol (TCP)

TCP Round Trip Time and Timeout

Setting the timeout EstimtedRTT plus “safety margin”

large variation in EstimatedRTT -> larger safety margin first estimate of how much SampleRTT deviates from

EstimatedRTT:

TimeoutInterval = EstimatedRTT + 4*DevRTT

DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT|

(typically, = 0.25)

Then set timeout interval:

Page 24: Reliable Data Transfer in Transmission Control Protocol (TCP)

TCP Congestion Control

Page 25: Reliable Data Transfer in Transmission Control Protocol (TCP)

Congestion Control Overview

Congestion: informally: “too many sources sending too much data too fast for network to handle”

Signal and detect congestion Policy for source to adjust transmission rate to

match network bandwidth capacity Decrease rate upon congestion signal Increase for utilization

Initialization to reach steady state

Page 26: Reliable Data Transfer in Transmission Control Protocol (TCP)

Network Utilization

Queuing delay (theoretically) could approach infinity with increased load

Network Power (ratio of throughput to delay)

Optimalload Load

Th

rou

ghp

ut/d

elay

Knee

Queuing delay

Page 27: Reliable Data Transfer in Transmission Control Protocol (TCP)

Approaches towards congestion control

End-end congestion control:

no explicit feedback from network

congestion inferred from end-system observed loss, delay

approach taken by TCP

Network-assisted congestion control:

routers provide feedback to end systems single bit indicating

congestion (SNA, DECbit, TCP/IP ECN, ATM)

explicit rate sender should send at

Two broad approaches towards congestion control:

Page 28: Reliable Data Transfer in Transmission Control Protocol (TCP)

TCP Congestion Control

end-end control (no network assistance)

sender limits transmission: sndWindow

= LastByteSent-LastByteAcked

min(cwnd, rcvWin) Cwnd is a dynamic function of

perceived network congestion

How does sender perceive congestion?

loss event = timeout or 3 duplicate acks

TCP sender reduces rate (cwnd) after loss event

Page 29: Reliable Data Transfer in Transmission Control Protocol (TCP)

TCP Congestion Control Outline Basic Idea: Probe (test) the current

available bandwidth in the network & adjust your sending rate accordingly

1. Start with a very small sending rate -- 1 packet2. Increase your sending rate as long as no

congestion is detected• How do you detect “no congestion”? – No packet loss• How do you increase your sending rate?

3. Decrease your sending rate when you detect congestion• How do you detect congestion? – Packet loss

(timeout, 3 duplicate ACK)• How do you decrease your sending rate?

Page 30: Reliable Data Transfer in Transmission Control Protocol (TCP)

TCP Slow Start When connection begins, cwnd = 1 MSS

Example: MSS = 500 bytes & RTT = 200 msec initial rate = 20 kbps

available bandwidth may be >> MSS/RTT desirable to quickly ramp up to respectable rate

When connection begins, increase rate exponentially fast until first loss event

When loss occurs (congestion signal), set cwnd to 1 MSS and re-start with slow start

Page 31: Reliable Data Transfer in Transmission Control Protocol (TCP)

TCP Slow Start (more) When connection begins,

increase rate exponentially until first loss event: double cwnd every RTT done by incrementing cwnd by 1

MSS for every ACK received

Summary: initial rate is slow but ramps up

exponentially fast When the first packet loss occurs

(either a timeout occurred or 3 duplicate ACKs were received), set cwnd to 1 MSS and start over.

Host A

one segment

RTT

Host B

time

two segments

four segments

Page 32: Reliable Data Transfer in Transmission Control Protocol (TCP)

TCP After the First Packet LossQuestion: Should we continue doing slow start

throughout the lifetime of the TCP connection? Why did we increase cwnd exponentially at the beginning?

• Because we had no idea how much of our traffic the network can carry, so we needed to probe fast to figure it out

What do we know at the first packet loss event?• That the network cannot carry our traffic at the rate we had

at the time the packet loss occurred– What was our rate at the time the packet loss occurred?

» Cwnd/RTT

Refinement Idea: Use the knowledge we obtained at the time of the packet loss for further refinement of the slow start algorithm. How? Keep a threshold value, ssthresh, and set to 1/2 of cwnd

just before loss event. Cnwd increases exponentially until it reaches ssthresh,

and linearly afterwards – this is called congestion avoidance

Page 33: Reliable Data Transfer in Transmission Control Protocol (TCP)

Congestion Avoidance – TCP Tahoe

Q: When should the exponential increase switch to linear?

A: When cwnd gets to 1/2 of its value before timeout.

Implementation: Threshold variable ssthresh Set to some large value, i.e.,

65K, when the connection is established

At loss event, ssthresh is set to 1/2 of cwnd just before loss event

0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Transmission round

co

ng

es

tio

n w

ind

ow

siz

e

(se

gm

en

ts)

threshold

TCP Tahoe

if (cwnd < ssthresh) cwnd +=1;else cwnd += 1/cwnd;

Figure assumes that the first packet loss has occurred when cwnd was 16, so sshtresh = 8, and cwnd = 1

Page 34: Reliable Data Transfer in Transmission Control Protocol (TCP)

Linear Window Increase Example

During Congestion Avoidance

Linear Congestion Window Increase during Congestion Avoidance Phase Window is increased by 1

packet after a full window size of packets is ACKed

Host A

one segment

RTT

Host B

time

two segments

three segments

four segments

Page 35: Reliable Data Transfer in Transmission Control Protocol (TCP)

Towards a better Congestion Control Algorithm – TCP Reno

Question: Should we set cwnd to 1 both after After a timeout When 3 duplicate ACKs are received

Answer: timeout before 3 dup ACKs is “alarming”.

• So set cwnd to 1 BUT 3 dup ACKs before timeout indicates that the network is

capable of delivering some segments• Why not set cwnd to a bigger value rather than 1?• TCP Reno: Set cwnd to half of its value, which is equal to

sshtresh, and enter linear increase phase

Page 36: Reliable Data Transfer in Transmission Control Protocol (TCP)

TCP Reno Refinement (more)

0

2

4

6

8

10

12

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Transmission round

co

ng

es

tio

n w

ind

ow

siz

e

(se

gm

en

ts)

threshold

TCP Tahoe

TCP Reno

RCP Reno Refinement After 3 dup ACKs:

cwnd is cut in half window then

grows linearly This is called “fast

recovery”

But after timeout event: cwnd instead set

to 1 MSS; window then

grows exponentially

to a threshold, then grows linearly

At time 8, when cwnd = 12, a packet loss is detected by 3 duplicate ACKs.

Tahoe sets cwnd to 1 unconditionally

Reno sets cwnd to half of its current value, which is 6

Notice that ssthresh is set to 6 in both cases

Also notice that if this was a “timeout”, then Reno would also set cwnd to 1

Page 37: Reliable Data Transfer in Transmission Control Protocol (TCP)

Summary: TCP Congestion Control

When cwnd is below ssthresh, sender in slow-start phase, window grows exponentially.

When cwnd is above sshthresh, sender is in congestion-avoidance phase, window grows linearly.

When a triple duplicate ACK occurs, ssthresh set to cwnd/2 and cwnd set to ssthresh.

When timeout occurs, ssthresh set to cwnd/2 and cwnd is set to 1 MSS.

Congestion window always oscillates !

Page 38: Reliable Data Transfer in Transmission Control Protocol (TCP)

Steady State TCP Modeling

8 Kbytes

16 Kbytes

24 Kbytes

time

congestionwindow

Long-lived TCP connection

Figure above shows the behavior of a TCP connection in steady state Additive Increase/Multiplicative Decrease (AIMD)

Further improving TCP Congestion Control Algorithm --TCP Vegas Detect congestion before packet loss occurs by observing RTTs

• Longer RTT, greater the congestion in the routers Lower the transmission rate linearly when packet loss is

imminent