high tcp performance over wide area networks arlington, va may 8, 2002 sylvain ravot caltech henp...

18
High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot <[email protected]> CalTech HENP Working Group

Upload: wilfrid-matthews

Post on 03-Jan-2016

217 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group

High TCP performance over wide area networks

Arlington, VA May 8, 2002

Sylvain Ravot <[email protected]>CalTech

HENP Working Group

Page 2: High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group

Slide 2

HENP WG Goal #3

Share information and provide advice on the configuration of routers, switches, PCs and network interfaces, and network testing and problem resolution, to achieve high performance over local and wide area networks in production.

Page 3: High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group

Slide 3

Overview

• TCPTCP

• TCP congestion avoidance TCP congestion avoidance algorithmalgorithm

• TCP parameters tuningTCP parameters tuning

• Gigabit Ethernet Adapter Gigabit Ethernet Adapter performanceperformance

Page 4: High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group

Slide 4

TCP Algorithms

Slow Start Congestion Avoidance

Connection opening : cwnd = 1 segment

Exponential increase for cwnd until cwnd = SSTHRESH

cwnd = SSTHRESHAdditiveincrease for cwnd

Retransmission timeout SSTHRESH:=cwnd/2 cwnd:= 1 segment

Retransmission timeout SSTHRESH:=cwnd/2

Fast Recovery

Exponentialincrease beyond cwnd Retransmission timeout

SSTHRESH:=cwnd/2

3 duplicate ack received

3 duplicate ack received

Expected ack received cwnd:=cwnd/2

Page 5: High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group

Slide 5

TCP Congestion Avoidance behavior (I)• Assumption

• The time spent in slow start is neglected• The time to recover a loss is neglected• No buffering (Max. congestion window size = Bandwidth Delay Product)• Constant RTT

• The congestion window is opened at the constant rate of one segment per RTT, so each cycle is W/2.

• The throughput is the area under the curve.

W

W/2

(RTT)WW/2

Page 6: High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group

Slide 6

Example

• Assumption• Bandwidth = 600 Mbps• RTT = 170 ms (CERN – CalTech)• BDB = 12.75 Mbytes• Cycle = 12.3 minutes• Time to transfer 10 Gbyte?

W

W/2

(RTT)WW/2

12.3 Min

3.8 minutes to transfer 10 GBytes if cwnd = 6.45 Mbytes at the beginning of the congestion avoidance state.

(Throughput = 350 Mbps)

2.4 minutes to transfer 10 Gbyte if cwnd = 12.05 Mbyte at the beginning

of the congestion avoidance state(Throughput = 550 Mbps)

Initial SSTRESH

Initial SSTRESH

Page 7: High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group

Slide 7

TCP Congestion Avoidance behavior (II)

• Area #1• Cwnd<BDP => Throughput < Bandwidth• RTT constant• Throughput = Cwnd / RTT

• Area #2• Cwnd > BDP => Througput = Bandwith• RTT increase (proportional to cwnd)

W

W/2

(RTT)WW/2

BDP

Buffering capacity

• We take into account the buffering space.

(cwnd)

Area #1 Area #2

Page 8: High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group

Slide 8

Tuning

• Keep the congestion window size in the yellow area :• Limit the maximum congestion widow size to avoid loss• Smaller backoff

W

(Time)

BDP

(Cwnd)

W

(Time)

BDP

(Cwnd)

• Limit the maximum congestion avoidance window size• In the application• In the OS

• Smaller backoff• TCP Multi-streams• After a loss : Cwnd := Cwnd × back_off

0.5 < Back_off < 1

• Limiting the maximum congestion avoidance widow size and setting a large initial ssthresh, we reached 125 Mbps throughput between CERN and Caltech and 143 Mbps throughput between CERN and Chicago through the 155 Mbps of the transatlantic link.

Page 9: High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group

Slide 9

Tuning TCP parameters

Buffer space that the kernel allocates for each socket • Kernel 2.2

• echo 262144 > /proc/sys/net/core/rmem_max echo 262144 > /proc/sys/net/core/wmem_max

• Kernel 2.4• echo "4096 87380 4194304" > /proc/sys/net/ipv4/tcp_rmem

echo "4096 65536 4194304" > /proc/sys/net/ipv4/tcp_wmem • The 3 values are respectively min, default, and max.

Socket buffer settings:• Setsockopt() of SO_RCVBUF and SO_SNDBUF

• Has to be set after calling socket() but before bind()• Kernel 2.2 : default value is 32KB• Kernel 2.4 : default value can be set in /proc/sys/net/ipv4 (see above)

Initial SSTRHESH

• Set the initial ssthresh to a value larger than the bandwidth delay product• No parameter to set this value in Linux 2.2 and 2.4 => Modified linux kernel

Slow Start

Exponential increase for cwnd until cwnd = SSTHRESH

Congestion Avoidance

Additive increase for cwnd

Connection opening : cwnd = 1 segment

Cwnd = SSTHRESH

Page 10: High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group

Slide 10

Gigabit Ethernet NICs performances• NIC tested

• 3com: 3C996-T• Syskonnect: SK-9843 SK-NET GE SX• Intel: PRO/1000 T and PRO/1000 XF

• 32 and 64 bit PCI Motherboards

• Measurements• Back to back linux PCs• Latest drivers available• TCP throughput

• Two different tests: Iperf and gensink. Gensink is a tool written at CERN for benchmarking TCP network performance

• Performance measurement with Iperf:• We ran 10 consecutive TCP transfers of 20 seconds each. Using the time command,

we measured the CPU utilization. • [root@pcgiga-2]#time iperf -c pcgiga-gbe – t 20• We report the throughput min/avg/max of the 10 transfers.

• Performance measurement with gensink:• We ran transfers of 10 Gbyte. Gensink allow us to measure the throughput and the

CPU utilization over the last 10 Mbyte transmitted.

Page 11: High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group

Slide 11

Syskonnect - SX, PCI 32 bit 33 MHZ

• Setup:• GbE adapter: SK-9843 SK-NET GE SX; Driver included in the kernel • CPU: PIV (1500 Mhz) PCI:32 bit 33MHz• Motherboard: Intel D850GB• RedHat 7.2 Kernel 2.4.17

• Iperf test:

• Gensink test:

Throughput min / avg / max = 256 / 448 / 451 Mbps CPU utilization average= 0.097 sec/Mbyte

0.10346.4428.9Average

0.11150449Max.

0.10044.5443Min.

CPU utilization per Mbps (% / Mbps)CPU utilization (%)Throughput (Mbps)

0.10346.4428.9Average

0.11150449Max.

0.10044.5443Min.

CPU utilization per Mbps (% / Mbps)CPU utilization (%)Throughput (Mbps)

TCP Throughput

0

200

400

600

800

1000

0 2000 4000 6000 8000 10000

Mbyte

Mb

it/s

CPU utilization

00.02

0.040.060.08

0.10.12

0 2000 4000 6000 8000 10000

MByte

sec/

Mby

te

Page 12: High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group

Slide 12

Intel - SX , PCI 32 bit 33 MHZ

• Setup:• GbE adapter: Intel PRO/1000 XF; Driver e1000; Version 4.1.7• CPU: PIV (1500 Mhz) PCI:32 bit 33MHz• Motherboard: Intel D850GB• RedHat 7.2 Kernel 2.4.17

• Iperf test:

• Gensink test:

Throughput min / avg / max = 380 / 609 / 631 Mbps CPU utilization average= 0.040 sec/Mbyte

TCP Throughput

01002003004005006007008009001000

0 2000 4000 6000 8000 10000

MByte

TCP Throughput

01002003004005006007008009001000

0 2000 4000 6000 8000 10000

MByte

CPU utilization

0

0.02

0.04

0.06

0.08

0.1

0 2000 4000 6000 8000 10000

Mbyte

sec/

Mb

yte

0.08652605.5Average

0.08753607Max.

0.08148.5601Min.

CPU utilization per Mbps (% / Mbps)CPU utilization (%)Throughput (Mbps)

0.08652605.5Average

0.08753607Max.

0.08148.5601Min.

CPU utilization per Mbps (% / Mbps)CPU utilization (%)Throughput (Mbps)

Page 13: High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group

Slide 13

3Com - Cu, PCI 64 bit 66 MHZ

• Setup:• GbE adapter: 3C996-T; Driver bcm5700; Version 2.0.18• CPU: 2 x AMD Athlon MP PCI:64 bit 66MHz• Motherboard: Dual AMD Athlon MP Motherboard • RedHat 7.2 Kernel 2.4.7

• Iperf test

TCP Throughput

0

200

400

600

800

1000

0 2000 4000 6000 8000 10000

Mbyte

Mb

it/s

  Throughput (Mbps) CPU utilization (%) CPU utilisation per Mbit/s (% / Mbps)

Min. 835 43.8 0.052

Max. 843 51.5 0.061

Average 838 46.9 0.056

CPU utilization

00.0010.0020.0030.0040.0050.0060.0070.0080.0090.01

0 2000 4000 6000 8000 10000

MByte

• Gensink test:

Throughput min / avg / max = 232 / 889 / 945 Mbps CPU utilization average= 0.0066 sec/Mbyte

Page 14: High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group

Slide 14

Intel - Cu, PCI 64 bit 66 MHZ

• Setup• GbE adapter: Intel PRO/1000 T; Driver e1000; Version 4.1.7• CPU: 2 x AMD Athlon MP PCI:64 bit 66MHz• Motherboard: Dual AMD Athlon MP Motherboard • RedHat 7.2 Kernel 2.4.7

• Iperf test :

• Gensink test:

Throughput min / avg / max = 429 / 905 / 943 Mbps CPU utilization average= 0.0065 sec/Mbyte

0.05344.5846.1Average

0.05447.5873Max.

0.05041813Min.

CPU utilization per Mbit/s (% / Mbps)CPU utilization (%)Throughput (Mbps)

0.05344.5846.1Average

0.05447.5873Max.

0.05041813Min.

CPU utilization per Mbit/s (% / Mbps)CPU utilization (%)Throughput (Mbps)

TCP Througput

0

200

400

600

800

1000

0 2000 4000 6000 8000 10000

Mbyte

Mb

it/s

CPU utilization

0

0.002

0.004

0.006

0.008

0.01

0 2000 4000 6000 8000 10000

Mbyte

sec/

MB

yte

Page 15: High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group

Slide 15

Intel - SX, PCI 64 bit 66 MHZ

• Setup• GbE adapter: Intel PRO/1000 XF; Driver e1000; Version 4.1.7• CPU: 2 x AMD Athlon MP PCI:64 bit 66MHz• Motherboard: Dual AMD Athlon MP Motherboard • RedHat 7.2 Kernel 2.4.7

• Iperf test :

• Gensink test:

Throughput min / avg / max = 222 / 799 / 940 Mbps CPU utilization average= 0.0062 sec/Mbyte

TCP Throughput

0

200

400

600

800

1000

0 2000 4000 6000 8000 10000

MByte

Mb

it/s

TCP Throughput

0

200

400

600

800

1000

0 2000 4000 6000 8000 10000

MByte

Mb

it/s

CPU utilization

0

0.002

0.004

0.006

0.008

0.01

0 2000 4000 6000 8000 10000

Mbyte

sec/

MB

yte

0.05445.8854Average

0.05649.1877Max.

0.05243.2828Min.

CPU utilisation per Mbit/s (% / Mbps)CPU utilization (%)Throughput (Mbps)

0.05445.8854Average

0.05649.1877Max.

0.05243.2828Min.

CPU utilisation per Mbit/s (% / Mbps)CPU utilization (%)Throughput (Mbps)

Page 16: High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group

Slide 16

Syskonnect - SX, PCI 64 bit 66 MHZ

• Setup• GbE adapter: SK-9843 SK-NET GE SX; Driver included in the kernel • CPU: 2 x AMD Athlon MP PCI:64 bit 66MHz• Motherboard: Dual AMD Athlon MP Motherboard • RedHat 7.2 Kernel 2.4.7

• Iperf test

Throughput min / avg / max = 146 / 936 / 947 Mbps CPU utilization average= 0.0083 sec/Mbyte

• Gensink test:

0.07667.9894.9Average

0.07669909Max.

0.07767.5874Min.

CPU utilization per Mbps (% / Mbps)CPU utilization (%)Throughput (Mbps)

0.07667.9894.9Average

0.07669909Max.

0.07767.5874Min.

CPU utilization per Mbps (% / Mbps)CPU utilization (%)Throughput (Mbps)

TCP Throughput

0

200

400

600

800

1000

0 2,000 4,000 6,000 8,000 10,000

MByte

MB

it/s

CPU utilization

0

0.002

0.004

0.006

0.008

0.01

0 2,000 4,000 6,000 8,000 10,000MByte

sec/

Mby

te

Page 17: High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group

Slide 17

Summary

• 32 PCI bus• Intel NICs achieved the highest throughput (600 Mbps) with the smallest CPU utilization.

Syskonnect NICs achieved only 450 Mbps with a higher CPU utilization.

• 32 Vs 64 PCI bus• 64 PCI bus is needed to get high throughput:

• We multiplied by 2 the throughput by moving Syskonnect NICs from 32 to 64 PCI buses.

• We increased the throughput by 300 Mbps by moving Intel NICs from 32 to 64 PCI buses.

• 64 PCI bus• Syskonnect NICs achieved the highest throughput (930 Mbps) with the highest CPU

utilization.• Intel NICs performances are unstable. • 3Com NICs are a good compromise between stability, performance, CPU utilization and

cost. Unfortunately, we couldn’t test the 3Com NIC with fiber connector.

• Cu Vs Fiber connector• We could not measure important differences.

• Strange behavior of Intel NICs. The throughout achieve by Intel NICs is unstable.

Page 18: High TCP performance over wide area networks Arlington, VA May 8, 2002 Sylvain Ravot CalTech HENP Working Group

Slide 18

Questions ?