transport: tcp manpreet singh (slides borrowed from various sources on the web)

Transport: TCP

Manpreet Singh

(Slides borrowed from various sources on the web)

Announcements (1/2)

Everybody needs to join the class mailing list...else I can't communicate class info.

Check the class archives to see if someone else has picked the same lecture or TCP application

We have a group of machines you can use for simulation (snoopy, linus, etc.). You need CSUG accounts to access these

machines. We’ll dig up more machines for those who want to

do kernel hacking.

Announcements (2/2)

Need a volunteer to give the "post-modern" E2E lecture 9/9 (in class...).

The non-research track students will have to do an initial demo by 11/9. Most of the functionality should be there Allows us to give feed back You time to do performance measurements.

Roadmap

Why is TCP fair ? Loss-based congestion schemes

Tahoe Reno NewReno Sack

Delay-based congestion control (Vegas) Modeling TCP throughput Equation-based congestion control

The Desired Properties of a

Congestion Control Scheme

Efficiency (high utilization)

Optimality (high throughput, utility)

Fairness (resource sharing)

Distributedness (no central knowledge for scalability)

Convergence and stability (fast convergence after disturbance, low oscillation)

TCP Fairness

Fairness goal: if N TCP sessions share same bottleneck link, each should get 1/N of link capacity

TCP congestion avoidance:

AIMD: additive increase, multiplicative decrease

increase window by 1 per RTT

decrease window by factor of 2 on loss event

AIMD

TCP connection 1

bottleneckrouter

capacity R

TCP connection 2

Why is TCP fair?Two competing sessions: Additive increase gives slope of 1, as throughout increases multiplicative decrease decreases throughput proportionally

R

R

equal bandwidth share

Connection 1 throughputConnect

ion 2

th

roughput

congestion avoidance: additive increaseloss: decrease window by factor of 2

congestion avoidance: additive increaseloss: decrease window by factor of 2

Loss vs Delay as signal ???

TCP

oscillation

Loss is a binary signalDelay is a multi-bit signal

Simulation-based Comparisons of Tahoe, Reno, and SACK TCP

Kevin FallSally Floyd

Introduction

SACK compared with Tahoe, Reno and New-Reno

Simulations designed to highlight performance differences with and without SACK

Comparison

Tahoe: Slow start, congestion avoidance and fast retransmit

Reno: Tahoe + fast recovery New-Reno: Reno with modified fast

recovery SACK: Reno + selective ACKs

TCP Slowstart

exponential increase (per RTT) in window size (not so slow!)

loss event: timeout (Tahoe TCP) and/or or three duplicate ACKs (Reno TCP)

initialize: Congwin = 1for (each segment ACKed) Congwin++until (loss event OR CongWin > threshold)

Slowstart algorithm(non-linear phase)

Host A

one segment

RTT

Host B

time

two segments

four segments

TCP Congestion Avoidance

/* slowstart is over */ /* Congwin > threshold */Until (loss event) { every w segments ACKed: Congwin++ }threshold = Congwin/2Congwin = 1perform slowstart

Congestion avoidance(linear phase)

1

1: TCP Reno skips slowstart (fast recovery) after three duplicate ACKs

Fast Retransmit

Receiving small number of duplicate ACKs (3) signals packet loss

Lost packet can be retransmitted before timeout

This improves channel utilization

TCP/Reno Congestion ControlInitially:

cwnd = 1;ssthresh = infinite (64K);

For each newly ACKed segment:if (cwnd < ssthresh) /* slow start*/ cwnd = cwnd + 1;else /* congestion avoidance; cwnd increases (approx.) by 1 per RTT */ cwnd += 1/cwnd;

Triple-duplicate ACKs: /* multiplicative decrease */

cwnd = ssthresh = cwnd/2;Timeout:

ssthresh = cwnd/2;cwnd = 1;

(if already timed out, double timeout value; this is called exponential backoff)

TCP/Reno: Big Picture

Time

cwnd

slowstart

congestionavoidance

TD

TD: Triple duplicate acknowledgementsTO: Timeout

TOssthresh

ssthresh ssthreshssthresh

congestionavoidance

TD

congestionavoidance

slow start

congestionavoidance

TD

Tahoe + Fast Recovery

Fast Recovery

Observation: Each duplicate ACK indicates some packet has left pipe

Old cwnd

New cwnd = (old cwnd)/2

Left edge fixed till ACK received

Usable window increased by 1 for each duplicate ACK

Packet lost

New-Reno extension

New-Reno continues with fast recovery if a partial ACK is received

Old cwnd

New cwnd = (old cwnd)/2 Usable window increased by 1 for each duplicate ACK until ACK for LP is received

Packet 1 lost

Packet 2 lost

LP: Last Packet sent before loss detection

Why use SACK?

Without SACK sender has to use one of following retransmission strategies

- Retransmit 1 dropped packet / RTTReno, New-Reno

- Retransmit packets that might have been successfully delivered

Tahoe

SACK option [RFC2018]

Ex: 2nd segment dropped (each segment has 500 bytes)

seg ack Sack1 left

Sack1right

5000 5500

5500 lost

6000 5500 6000 6500

SACK Congestion Control (1/2)

Conservative extensions to Reno Fast recovery algorithm modified

Uses a variable called “pipe” to estimate outstanding data in the flow

Rules for changing “pipe” variable + 1 when packet transmitted - 1 when dup ACK received

SACK Congestion Control (2/2)

SACK sender tracks successfully sent packets using “scoreboard” structure

Missing packets are retransmitted

Similar to New-Reno in exiting from fast recovery – exits after all outstanding data at time of loss is ACked

Simulation Model used

Three flows are setup from S1 to K1, 2nd and 3rd flows are used to change packet drop pattern of 1st flow

One Packet Loss (1/2)

Packet dropped Packet

retransmitted

Performs slow start

One Packet Loss (2/2)

Packet dropped Packet

retransmitted

Performs fast recovery

Two Packet Loss (1/2)

Packets dropped

Packets retransmitted

Performs slow start

Two Packet Loss (2/2)

Packets dropped


Performs fast recovery

Three Packet Losses (1/3)

Packets dropped


Has to wait for timeout


Packets dropped


No need for timeout Retransmits 1 pkt/RTT


Packets dropped


Retransmits more than 1 pkt /RTT

Observations

Tahoe: Robust, performs slow start Reno: For > 2 losses, timeout is

often needed New-Reno: Can avoid timeouts, but

still cannot retransmit > 1 pkt/RTT SACK: Can retransmit > 1 pkt/RTT ,

thus recovers from losses faster

Conclusions

SACK can improve TCP performance

SACK can be used in high loss links too (Ex: Wireless)

New-Reno demonstrates that certain problems of Reno can be avoided without SACK

Reno vs Vegas (Congestion Avoidance)

Reno’s mechanism Characteristics

uses the loss of segments as a signal reactive not proactive needs to create losses to find the available

bandwidth example

Threshold window

congestion window

send window

TCP Vegas Idea: source watches for some sign that

router’s queue is building up and congestion will happen too; e.g.,

RTT grows sending rate flattens

60

20

0.5 1.0 1.5 4.0 4.5 6.5 8.0

KB

Time (seconds)

Time (seconds)

70

304050

10

2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5

900

300100

0.5 1.0 1.5 4.0 4.5 6.5 8.0

Sen

ding

KB

ps

1100

500700

2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5

Time (seconds)0.5 1.0 1.5 4.0 4.5 6.5 8.0Q

ueue

siz

e in

rou

ter

5

10

2.0 2.5 3.0 3.5 5.0 5.5 6.0 7.0 7.5 8.5

Congestion window

Avg. source send rate

Buffer space at router

In shaded region we expect throughputto increase but it cannot increase beyondavailable bandwidth

Vegas’ approach Basic idea

Vegas tries not to send at a rate that causes buffers to fill

maintain the right amount of extra data based on changes in the estimated

amount of extra data window size vs. throughput

Keep the actual rate straying too far from the available rate (resulting in smooth congestion avoidance period)

Vegas Algorithm define a given connection’s BaseRTT

BaseRTT = the minimum of all measured RTT

expected throughput = WindowSize / BaseRTT

Actual rate = Flight size / RTT Calculate the current Actual sending rate Compare Actual (A) to expected (E) and

adjusts the window (linear increase or decrease)

If (E-A) > beta, cwnd - - (congestion state) If (E-A) < alpha, cwnd++ (low utilization)

When a loss is detected, reduce the window by a half

Algorithm (cont)

Parameters = 1 buffer = 3 buffers

70605040302010

KB

Time (seconds)

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

CA

M K

Bps

240200160120

8040

Time (seconds)

Black line = actual rateGreen line = expected rateShaded = region between and

Note: Linear decrease in Vegas does not violate AIMD since itHappens before packets loss

Comparison of Reno and Vegas (Retransmission)

Reno’s retransmission mechanism retransmission timeout

based on RTT and variance estimates BSD-based : 500ms

Fast Retransmit and Fast Recovery When the sender receives duplicate acks, it

reduces the window size by a half and avoids timeout which causes retransmission with slow start

If multiple drops occur, timeout and slow start will follow anyway.

19% increase in throughput

Vegas’ Retransmission

reads and records the system clock each time a segment is sent

when an ACK arrives, Vegas reads the clock again

RTT calculation using this time and the timestamp recorded for the relevant segment

uses this more accurate RTT estimate to decide to retransmit

Some fun topics to discuss…

Modeling TCP throughput Consider congestion avoidance only

Time

cwnd

congestionavoidance

TD

ssthresh

Assume one packet loss (loss event) per cycleTotal packets send per cycle: 3W2/8Thus p = 1/(3W2/8) = 8/(3W2)=>

bottleneckbandwidth

W/2

W8

324

3 2WWW

ppW 6.13/8

Modeling TCP throughput…

1/throughput = c * sqrt(p) * RTT

Equation-based Congestion Control

Don’t need reliability But still want to be friendly to the network What rate should we send the UDP traffic ? Use detailed TCP analysis to relate

throughput to loss and RTT. Measure these values and then calculated

appropriate throughput directly. Result is rate-based and equation-driven

protocol called TFRC.

mulTCP

Effect of AIMD parameters on the throughput of TCP

transport: tcp manpreet singh (slides borrowed from various sources on the web)

Documents