mb-ng review high performance network demonstration 21 april 2004

32
MB-NG Review – 24 April 2004 Richard Hughes-Jones The University of Manchester, UK MB-NG Review High Performance Network Demonstration 21 April 2004

Upload: evelia

Post on 05-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

MB-NG Review High Performance Network Demonstration 21 April 2004. Richard Hughes-Jones The University of Manchester, UK. It works ? So what’s the Problem with TCP. TCP has 2 phases: Slowstart & Congestion Avoidance AIMD and High Bandwidth – Long Distance networks - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: MB-NG Review High Performance Network Demonstration  21 April 2004

MB-NG Review – 24 April 2004

Richard Hughes-JonesThe University of Manchester, UK

MB-NG Review

High Performance Network Demonstration

21 April 2004

Page 2: MB-NG Review High Performance Network Demonstration  21 April 2004

2MB-NG Review, April 2004

It works ?So what’s the Problem with TCP

TCP has 2 phases: Slowstart & Congestion Avoidance AIMD and High Bandwidth – Long Distance networksPoor performance of TCP in high bandwidth wide area networks is duein part to the TCP congestion control algorithm - cwnd congestion window

For each ack in a RTT without loss:cwnd -> cwnd + a / cwnd - Additive Increase, a=1 For each window experiencing loss:cwnd -> cwnd – b (cwnd) - Multiplicative Decrease, b= ½

Time to recover from 1 packet loss ~100 ms rtt:

Page 3: MB-NG Review High Performance Network Demonstration  21 April 2004

3MB-NG Review, April 2004

Investigation of new TCP Stacks

High Speed TCPa and b vary depending on current cwnd using a table

a increases more rapidly with larger cwnd – returns to the ‘optimal’ cwnd size sooner for the network path

b decreases less aggressively and, as a consequence, so does the cwnd. The effect is that there is not such a decrease in throughput.

Scalable TCP a and b are fixed adjustments for the increase and decrease of cwnd

a = 1/100 – the increase is greater than TCP Reno b = 1/8 – the decrease on loss is less than TCP Reno Scalable over any link speed.

Fast TCPUses round trip time as well as packet loss to indicate congestion with

rapid convergence to fair equilibrium for throughput. HSTCP-LP High Speed (Low Priority) – backs off if rtt increases BiC-TCP – Additive increase large cwnd; binary search small cwnd H-TCP – after congestion standard then switch to high performance ●●●

Page 4: MB-NG Review High Performance Network Demonstration  21 April 2004

4MB-NG Review, April 2004

Comparison of TCP Stacks TCP Response Function

Throughput vs Loss Rate – steeper: faster recovery Drop packets in kernel

MB-NG rtt 6ms DataTAG rtt 120 ms

Page 5: MB-NG Review High Performance Network Demonstration  21 April 2004

5MB-NG Review, April 2004

Multi-Gigabit flows at SC2003 BW Challenge

Three Server systems with 10 GigEthernet NICs Used the DataTAG altAIMD stack 9000 byte MTU Send mem-mem iperf TCP streams From SLAC/FNAL booth in

Phoenix to:

Chicago Starlight rtt 65 ms window 60 MB Phoenix CPU 2.2 GHz 3.1 Gbit hstcp I=1.6%

Amsterdam SARA rtt 175 ms window 200 MB Phoenix CPU 2.2 GHz 4.35 Gbit hstcp I=6.9%

New TCP stacks are very Stable Both used Abilene to Chicago

10 Gbits/s throughput from SC2003 to Chicago & Amsterdam

0

1

2

3

4

5

6

7

8

9

10

11/19/0315:59

11/19/0316:13

11/19/0316:27

11/19/0316:42

11/19/0316:56

11/19/0317:11

11/19/0317:25 Date & Time

Thr

ough

put

Gbi

ts/s

Router traffic to Abilele

Phoenix-Chicago

Phoenix-Amsterdam

Page 6: MB-NG Review High Performance Network Demonstration  21 April 2004

6MB-NG Review, April 2004

Transfer Applications – Throughput [1] 2Gbyte file transferred RAID0 disks Manc – UCL GridFTP See alternate 600/800 Mbit and zero

Apache web server + curl-based client See steady 720 Mbit

Page 7: MB-NG Review High Performance Network Demonstration  21 April 2004

7MB-NG Review, April 2004

Transfer Applications – Throughput [2] 2Gbyte file transferred RAID5 - 4disks Manc – RAL bbcp Mean 710 Mbit/s

GridFTP See many zeros

Mean ~710

Mean ~620

Page 8: MB-NG Review High Performance Network Demonstration  21 April 2004

8MB-NG Review, April 2004

Topology of the MB – NG Network

KeyGigabit Ethernet2.5 Gbit POS Access

MPLS Admin. Domains

UCL Domain

Edge Router Cisco 7609

man01

man03

Boundary Router Cisco 7609

Boundary Router Cisco 7609

RAL Domain

Manchester Domain

lon02

man02

ral01

UKERNADevelopment

Network

Boundary Router Cisco 7609

ral02

ral02

lon03

lon01

Page 9: MB-NG Review High Performance Network Demonstration  21 April 2004

9MB-NG Review, April 2004

High Throughput DemoManchester

man03lon01

2.5 Gbit SDHMB-NG Core

1 GEth1 GEth

Cisco GSR

Cisco GSR

Cisco7609

Cisco7609

London

Dual Zeon 2.2 GHz Dual Zeon 2.2 GHz

Send data with TCPDrop Packets

Monitor TCP with Web100

Page 10: MB-NG Review High Performance Network Demonstration  21 April 2004

10MB-NG Review, April 2004

Standard to HS-TCP No loss, but output queue filled by sender

Page 11: MB-NG Review High Performance Network Demonstration  21 April 2004

11MB-NG Review, April 2004

HS-TCP to Scalable No loss, but output queue filled by sender

Page 12: MB-NG Review High Performance Network Demonstration  21 April 2004

12MB-NG Review, April 2004

Standard, HS-TCP, Scalable Drop 1 in 25,000

Page 13: MB-NG Review High Performance Network Demonstration  21 April 2004

13MB-NG Review, April 2004

Standard Reno TCP Drop 1 in 106

Page 14: MB-NG Review High Performance Network Demonstration  21 April 2004

14MB-NG Review, April 2004

Focus on Helping Real Users: Throughput CERN -SARA

Standard TCP txlen 100 25 Jan03

0

100

200

300

400

500

1043509370 1043509470 1043509570 1043509670 1043509770

Time

I/f

Rat

e M

bits

/s

00.20.40.60.811.21.41.61.82

Re

cv. R

ate

Mb

its/s

Out Mbit/s In Mbit/s

Hispeed TCP txlen 2000 26 Jan03

0

100

200

300

400

500

1043577520 1043577620 1043577720 1043577820 1043577920Time

I/f

Rat

e M

bits

/s

00.20.40.60.811.21.41.61.82

Rec

v. R

ate

Mbi

ts/s

Out Mbit/s

In Mbit/s

Using the GÉANT Backup Link 1 GByte disk-disk transfers Blue is the Data Red is the TCP ACKs

Standard TCP Average Throughput 167 Mbit/s Users see 5 - 50 Mbit/s!

High-Speed TCP Average Throughput 345 Mbit/s

Scalable TCP Average Throughput 340 Mbit/s

Technology link to EU Projects: DataGrid DataTAG & GÉANT

Scalable TCP txlen 2000 27 Jan03

0

100

200

300

400

500

1043678800 1043678900 1043679000 1043679100 1043679200Time

II/f

Rat

e M

bits

/s

00.20.40.60.811.21.41.61.82

Re

cv. R

ate

Mb

its/s

Out Mbit/s

In Mbit/s

Page 15: MB-NG Review High Performance Network Demonstration  21 April 2004

15MB-NG Review, April 2004

BaBar Case Study: Host, PCI & RAID Controller Performance

RAID0 (striped) & RAID5 (stripped with redundancy) 3Ware 7506 Parallel 66 MHz 3Ware 7505 Parallel 33 MHz 3Ware 8506 Serial ATA 66 MHz ICP Serial ATA 33/66 MHz Tested on Dual 2.2 GHz Xeon Supermicro P4DP8-G2 motherboard Disk: Maxtor 160GB 7200rpm 8MB Cache Read ahead kernel tuning: /proc/sys/vm/max-readahead

Disk – Memory Read Speeds Memory - Disk Write Speeds

Page 16: MB-NG Review High Performance Network Demonstration  21 April 2004

16MB-NG Review, April 2004

Topology of the MB – NG Network

KeyGigabit Ethernet2.5 Gbit POS Access

MPLS Admin. Domains

UCL Domain

Edge Router Cisco 7609

man01

man03

Boundary Router Cisco 7609

Boundary Router Cisco 7609

RAL Domain

Manchester Domain

lon02

man02

ral01

UKERNADevelopment

Network

Boundary Router Cisco 7609

ral02

ral02

lon03

lon01

HW RAID

HW RAID

Page 17: MB-NG Review High Performance Network Demonstration  21 April 2004

17MB-NG Review, April 2004

BaBar Data: Throughput on MB–NG kit

RAID5 - 4disks RAL - Manc Includes small files ~Kbytes bbftp 1 stream with compression

bbftp 6 streams

bbftp 1 stream no compression 10 * 2 G byte files – each peak is a 20 G byte transfer

bbftp 1 streamFiles ≥ 1 Mbyte

With bb diag

Page 18: MB-NG Review High Performance Network Demonstration  21 April 2004

18MB-NG Review, April 2004

Helping Real UsersRadio Astronomy VLBI

PoC with NRNs & GEANT 1024 Mbit/s 24 on 7 NOW

Page 19: MB-NG Review High Performance Network Demonstration  21 April 2004

19MB-NG Review, April 2004

1472 byte Packets man -> JIVE FWHM 22 µs (B2B 3 µs )

VLBI Project: Throughput Jitter & 1-way Delay

1472 bytes w=50 jitter Gnt5-DwMk5 28Oct03

0

2000

4000

6000

8000

10000

0 20 40 60 80 100 120 140

Jitter us

N(t

)

1472 bytes w12 Gnt5-DwMk5 21Oct03

0

2000

4000

6000

8000

10000

12000

2000 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000Packet No.

1-w

ay d

elay

us

1472 bytes w12 Gnt5-DwMk5 21Oct03

0

2000

4000

6000

8000

10000

12000

0 1000 2000 3000 4000 5000Packet No.

1-w

ay d

elay

us

1-way Delay – note the packet loss (points with zero 1 –way delay)

Gnt5-DwMk5 11Nov03/DwMk5-Gnt5 13Nov03-1472bytes

0

200

400

600

800

1000

1200

0 5 10 15 20 25 30 35 40Spacing between frames us

Recv W

ire r

ate

Mbits/s

Gnt5-DwMk5

DwMk5-Gnt5

1472 byte Packets Manchester -> Dwingeloo JIVE

Page 20: MB-NG Review High Performance Network Demonstration  21 April 2004

20MB-NG Review, April 2004

Case Study: ATLAS LHC

Tests streaming built Events from Level3 Trigger to remote compute farm in real time

500 Mbit to 1 Gbit CERN – Man Investigation of use of new high performance TCPs

Testing concepts in the ATLAS Offline Computing model More Mesh than Star:

CERN Tier0 to Tier 1s Tier 2s to all Tier 1s

Tests planned over production networks: Lancaster-Manchester NNW SuperJANET4 Lancaster-Manchester to CERN

Page 21: MB-NG Review High Performance Network Demonstration  21 April 2004

21MB-NG Review, April 2004

Page 22: MB-NG Review High Performance Network Demonstration  21 April 2004

22MB-NG Review, April 2004

Scalable TCP DataTAG Drop 1 in 106

Page 23: MB-NG Review High Performance Network Demonstration  21 April 2004

23MB-NG Review, April 2004

HS-TCP DataTAG Drop 1 in 106

Page 24: MB-NG Review High Performance Network Demonstration  21 April 2004

24MB-NG Review, April 2004

Standard Reno TCP DataTAG Drop 1 in 106 Transition highspeed to Standard TCP @ 520s

Page 25: MB-NG Review High Performance Network Demonstration  21 April 2004

25MB-NG Review, April 2004

Summary Multi-Gigabit transfers are possible and stable Demonstrated that new TCP stacks help

performance

DataTAG has made major contributions to understanding of high-speed networking

There has been significant technology transfer between DataTAG and other projects

Now reaching out to real users.

But still much research to do: Achieve performance – Protocol vs implementation issues Stability / Sharing issues Optical transports & hybrid networks

Page 26: MB-NG Review High Performance Network Demonstration  21 April 2004

26MB-NG Review, April 2004

10 Gigabit: Tuning PCI-X

mmrbc1024 bytes

mmrbc2048 bytes

mmrbc4096 bytes5.7Gbit/s

mmrbc512 bytes

CSR Access

PCI-X Sequence

Data Transfer

Interrupt & CSR Update

16080 byte packets every 200 µs

Intel PRO/10GbE LR Adapter PCI-X bus occupancy vs mmrbc

Measured times Times based on PCI-X times

from the logic analyser Expected throughput ~7 Gbit/s

0

5

10

15

20

25

30

35

40

45

50

0 1000 2000 3000 4000 5000Max Memory Read Byte Count

PC

I-X

Tra

nsfe

r tim

e u

s

0

1

2

3

4

5

6

7

8

9

PC

I-X

Tra

nsfe

r ra

te G

bit/s

Measured PCI-X transfer time usexpected time usrate from expected time Gbit/s Max throughput PCI-X

Kernel 2.6.1#17 HP Itanium Intel10GE Feb04

0

2

4

6

8

10

0 1000 2000 3000 4000 5000Max Memory Read Byte Count

PC

I-X

Tra

nsfe

r tim

e

us

measured Rate Gbit/srate from expected time Gbit/s Max throughput PCI-X

Page 27: MB-NG Review High Performance Network Demonstration  21 April 2004

27MB-NG Review, April 2004

DataTAG Testbed

Page 28: MB-NG Review High Performance Network Demonstration  21 April 2004

28MB-NG Review, April 2004

BaBar Case Study: Disk Performance

BaBar Disk Server Tyan Tiger S2466N

motherboard 1 64bit 66 MHz PCI bus Athlon MP2000+ CPU AMD-760 MPX chipset 3Ware 7500-8 RAID5 8 * 200Gb Maxtor IDE

7200rpm disks Note the VM parameter

readahead max

Disk to memory (read)Max throughput 1.2 Gbit/s 150 MBytes/s)

Memory to disk (write)Max throughput 400 Mbit/s 50 MBytes/s)[not as fast as Raid0]

Page 29: MB-NG Review High Performance Network Demonstration  21 April 2004

29MB-NG Review, April 2004

RAID Controller PerformanceR

AID

0R

AID

5

Read Speed Write Speed

Page 30: MB-NG Review High Performance Network Demonstration  21 April 2004

30MB-NG Review, April 2004

BaBar: Serial ATA Raid Controllers RAID5 3Ware 66 MHz PCI

Read Throughput raid5 4 3Ware 66MHz SATA disk

0

200

400

600

800

1000

1200

1400

1600

0 200 400 600 800 1000 1200 1400 1600 1800 2000

File size MBytes

Mb

it/s

readahead max 31readahead max 63readahead max 127readahead max 256readahead max 512readahead max 1200

ICP 66 MHz PCI

Write Throughput raid5 4 3Ware 66MHz SATA disk

0

200

400

600

800

1000

1200

1400

1600

1800

0 200 400 600 800 1000 1200 1400 1600 1800 2000

File size MBytes

Mb

it/s

readahead max 31readahead max 63readahead max 127readahead max 256readahead max 516readahead max 1200

Read Throughput raid5 4 ICP 66MHz SATA disk

0

100

200

300

400

500

600

700

800

900

0 200 400 600 800 1000 1200 1400 1600 1800 2000

File size MBytes

Mb

it/s

readahead max 31readahead max 63readahead max 127readahead max 256readahead max 512readahead max 1200

Write Throughput raid5 4 ICP 66MHz SATA disk

0

200

400

600

800

1000

1200

1400

1600

0 200 400 600 800 1000 1200 1400 1600 1800 2000

File size MBytes

Mb

it/s

readahead max 31readahead max 63readahead max 127readahead max 256readahead max 512readahead max 1200

Page 31: MB-NG Review High Performance Network Demonstration  21 April 2004

31MB-NG Review, April 2004

Measure the time between lost packets in the time series of packets sent.

Lost 1410 in 0.6s Is it a Poisson process? Assume Poisson is

stationary λ(t) = λ

Use Prob. Density Function:

P(t) = λ e-λt

Mean λ = 2360 / s[426 µs]

Plot log: slope -0.0028expect -0.0024

Could be additional process involved

VLBI Project: Packet Loss Distributionpacket loss distribution 12b bin=12us

0

10

20

30

40

50

60

70

80

12 72 132

192

252

312

372

432

492

552

612

672

732

792

852

912

972

Time between lost frames (us)

Num

ber

in B

in

Measured

Poisson

packet loss distribution 12b

y = 41.832e-0.0028x

y = 39.762e-0.0024x

1

10

100

0 500 1000 1500 2000

Time between frames (us)

Num

ber

in B

in

Page 32: MB-NG Review High Performance Network Demonstration  21 April 2004

32MB-NG Review, April 2004

The performance of the end host / disks BaBar Case Study: RAID BW & PCI Activity

3Ware 7500-8 RAID5 parallel EIDE 3Ware forces PCI bus to 33 MHz BaBar Tyan to MB-NG SuperMicro

Network mem-mem 619 Mbit/s Disk – disk throughput bbcp

40-45 Mbytes/s (320 – 360 Mbit/s) PCI bus effectively full! User throughput ~ 250 Mbit/s

Read from RAID5 Disks Write to RAID5 Disks