protocols working with 10 gigabit ethernet

Post on 13-Jan-2016

48 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Protocols Working with 10 Gigabit Ethernet. Richard Hughes-Jones The University of Manchester www.hep.man.ac.uk/~rich/ then “Talks”. Introduction 10 GigE on SuperMicro X7DBE 10 GigE on SuperMicro X5DPE-G2 10 GigE and TCP– Monitor with web100 disk writes - PowerPoint PPT Presentation

TRANSCRIPT

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester1

Protocols

Working with 10 Gigabit Ethernet

Richard Hughes-Jones The University of Manchester

www.hep.man.ac.uk/~rich/ then “Talks”

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester2

Introduction 10 GigE on SuperMicro X7DBE 10 GigE on SuperMicro X5DPE-G2 10 GigE and TCP– Monitor with web100 disk writes 10 GigE and Constant Bit Rate transfers UDP + memory access GÉANT 4 Gigabit tests

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester3

UDP/IP packets sent between back-to-back systems Similar processing to TCP/IP but no flow control & congestion avoidance algorithms

Latency Round trip times using Request-Response UDP frames Latency as a function of frame size

Slope s given by:

Mem-mem copy(s) + pci + Gig Ethernet + pci + mem-mem copy(s) Intercept indicates processing times + HW latencies

Histograms of ‘singleton’ measurements UDP Throughput

Send a controlled stream of UDP frames spaced at regular intervals Vary the frame size and the frame transmit spacing & measure:

The time of first and last frames receivedThe number packets received, lost, & out of orderHistogram inter-packet spacing received packetsPacket loss pattern1-way delayCPU loadNumber of interrupts

Udpmon: Latency & Throughput Measurements

Tells us about: Behavior of the IP stack The way the HW operates Interrupt coalescence

Tells us about: Behavior of the IP stack The way the HW operates Capacity & Available throughput of

the LAN / MAN / WAN

1

s

paths data dt

db

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester4

Throughput Measurements

UDP Throughput with udpmon Send a controlled stream of UDP frames spaced at regular intervals

n bytes

Number of packets

Wait timetime

Zero stats OK done

●●●

Get remote statistics Send statistics:No. receivedNo. lost + loss patternNo. out-of-orderCPU load & no. int1-way delay

Send data frames at regular intervals

●●●

Time to send Time to receive

Inter-packet time(Histogram)

Signal end of testOK done

Time

Sender Receiver

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester5

High-end Server PCs for 10 Gigabit

Boston/Supermicro X7DBE Two Dual Core Intel Xeon Woodcrest 5130

2 GHz Independent 1.33GHz FSBuses

530 MHz FD Memory (serial) Parallel access to 4 banks

Chipsets: Intel 5000P MCH – PCIe & MemoryESB2 – PCI-X GE etc.

PCI 3 8 lane PCIe buses 3* 133 MHz PCI-X

2 Gigabit Ethernet SATA

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester6

10 GigE Back2Back: UDP Latency Motherboard: Supermicro X7DBE Chipset: Intel 5000P MCH CPU: 2 Dual Intel Xeon 5130

2 GHz with 4096k L2 cache Mem bus: 2 independent 1.33 GHz PCI-e 8 lane Linux Kernel 2.6.20-web100_pktd-plus Myricom NIC 10G-PCIE-8A-R Fibre myri10ge v1.2.0 + firmware v1.4.10

rx-usecs=0 Coalescence OFF MSI=1 Checksums ON tx_boundary=4096

MTU 9000 bytes

Latency 22 µs & very well behaved Latency Slope 0.0028 µs/byte B2B Expect: 0.00268 µs/byte

Mem 0.0004 PCI-e 0.00054 10GigE 0.0008 PCI-e 0.00054 Mem 0.0004

gig6-5_Myri10GE_rxcoal=0

y = 0.0028x + 21.937

0

10

20

30

40

50

60

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Message length bytes

La

ten

cy

us

64 bytes gig6-5

0

2000

4000

6000

8000

10000

12000

0 20 40 60 80

Latency us

N(t

)

8900 bytes gig6-5

0

1000

2000

3000

4000

5000

6000

0 20 40 60 80Latency us

N(t

)

3000 bytes gig6-5

0

2000

4000

6000

8000

10000

12000

0 20 40 60 80Latency us

N(t

)

Histogram FWHM ~1-2 us

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester7

10 GigE Back2Back: UDP Throughput Kernel 2.6.20-web100_pktd-plus Myricom 10G-PCIE-8A-R Fibre

rx-usecs=25 Coalescence ON

MTU 9000 bytes Max throughput 9.4 Gbit/s

Notice rate for 8972 byte packet

~0.002% packet loss in 10M packetsin receiving host

Sending host, 3 CPUs idle For <8 µs packets,

1 CPU is >90% in kernel modeinc ~10% soft int

Receiving host 3 CPUs idle For <8 µs packets,

1 CPU is 70-80% in kernel modeinc ~15% soft int

gig6-5_myri10GE

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 10 20 30 40Spacing between frames us

Re

cv W

ire r

ate

Mb

it/s 1000 bytes

1472 bytes

2000 bytes

3000 bytes

4000 bytes

5000 bytes

6000 bytes

7000 bytes

8000 bytes

8972 bytes

gig6-5_myri10GE

0

20

40

60

80

100

0 5 10 15 20 25 30 35 40Spacing between frames us

%c

pu

1 k

ern

el

sn

d

1000 bytes

1472 bytes

2000 bytes

3000 bytes

4000 bytes

5000 bytes

6000 bytes

7000 bytes

8000 bytes

8972 bytes

C

gig6-5_myri10GE

0

20

40

60

80

100

0 10 20 30 40Spacing between frames us

% c

pu

1

ke

rne

l re

c

1000 bytes

1472 bytes

2000 bytes

3000 bytes

4000 bytes

5000 bytes

6000 bytes

7000 bytes

8000 bytes

8972 bytes

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester8

10 GigE UDP Throughput vs packet size Motherboard: Supermicro X7DBE Linux Kernel 2.6.20-web100_

pktd-plus Myricom NIC 10G-PCIE-8A-R Fibre myri10ge v1.2.0 + firmware v1.4.10

rx-usecs=0 Coalescence ON MSI=1 Checksums ON tx_boundary=4096

Steps at 4060 and 8160 byteswithin 36 bytes of 2n boundaries

Model data transfer time as t= C + m*Bytes C includes the time to set up transfers Fit reasonable C= 1.67 µs m= 5.4 e4 µs/byte Steps consistent with C increasing by 0.6 µs

The Myricom drive segments the transfers, limiting the DMA to 4096 bytes – PCI-e chipset dependent!

gig6-5_myri_udpscan

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 2000 4000 6000 8000 10000Size of user data in packet bytes

Rec

v W

ire

rate

Mbi

t/s

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester9

10 GigE via Cisco 7600: UDP Latency Motherboard: Supermicro X7DBE PCI-e 8 lane Linux Kernel 2.6.20 SMP Myricom NIC 10G-PCIE-8A-R Fibre

myri10ge v1.2.0 + firmware v1.4.10 Rx-usecs=0 Coalescence OFF MSI=1 Checksums ON

MTU 9000 bytes

Latency 36.6 µs & very well behaved Switch Latency 14.66 µs Switch internal: 0.0011 µs/byte

PCI-e 0.00054 10GigE 0.0008

gig6-Cisco-5_Myri_rxcoal0

y = 0.0046x + 36.6

0

10

20

30

40

50

60

70

80

90

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Message length bytes

La

ten

cy

us

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester10

The “SC05” Server PCs

Boston/Supermicro X7DBE Two Intel Xeon Nocona

3.2 GHz Cache 2048k Shared 800 MHz FSBus

DDR2-400 Memory

Chipsets: Intel 7520 Lindenhurst

PCI 2 8 lane PCIe buses 1 4 lane PCIe buse 3* 133 MHz PCI-X

2 Gigabit Ethernet

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester11

10 GigE X7DBEX6DHE: UDP Throughput Kernel 2.6.20-web100_pktd-plus Myricom 10G-PCIE-8A-R Fibre

myri10ge v1.2.0 + firmware v1.4.10 rx-usecs=25

Coalescence ON MTU 9000 bytes Max throughput 6.3 Gbit/s

Packet loss ~ 40-60 % in receiving host

Sending host, 3 CPUs idle 1 CPU is >90% in kernel mode

Receiving host 3 CPUs idle For <8 µs packets,

1 CPU is 70-80% in kernel modeinc ~15% soft int

gig6-X6DHE_MSI_myri

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 10 20 30 40Spacing between frames us

Re

cv W

ire r

ate

Mb

it/s 1000 bytes

1472 bytes

2000 bytes

3000 bytes

4000 bytes

5000 bytes

6000 bytes

7000 bytes

8000 bytes

8972 bytes

gig6-X6DHE_MSI_myri

0

20

40

60

80

100

0 10 20 30 40Spacing between frames us

% c

pu

1

ke

rne

l re

c

1000 bytes

1472 bytes

2000 bytes

3000 bytes

4000 bytes

5000 bytes

6000 bytes

7000 bytes

8000 bytes

8972 bytes

gig6-X6DHE_MSI_myri

0

20

40

60

80

100

0 5 10 15 20 25 30 35 40Spacing between frames us

% P

ac

ke

t lo

ss

1000 bytes

1472 bytes

2000 bytes

3000 bytes

4000 bytes

5000 bytes

6000 bytes

7000 bytes

8972 bytes

8000 bytes

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester12

So now we can run at 9.4 Gbit/s

Can we do any work ?

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester13

10 GigE X7DBEX7DBE: TCP iperf No packet loss MTU 9000 TCP buffer 256k BDP=~330k Cwnd

SlowStart then slow growth Limited by sender !

Duplicate ACKs One event of 3 DupACKs

Packets Re-Transmitted

Throughput Mbit/s Iperf throughput 7.77 Gbit/s

Web100 plots of TCP parameters

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester14

10 GigE X7DBEX7DBE: TCP iperf Packet loss 1: 50,000 -recv-kernel patch MTU 9000 TCP buffer 256k BDP=~330k Cwnd

SlowStart then slow growth Limited by sender !

Duplicate ACKs ~10 DupACKs every lost packet

Packets Re-Transmitted One per lost packet

Throughput Mbit/s Iperf throughput 7.84 Gbit/s

Web100 plots of TCP parameters

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester15

10 GigE X7DBEX7DBE: CBR/TCP Packet loss 1: 50,000 -recv-kernel patch tcpdelay message 8120bytes Wait 7 µs RTT 36 µs TCP buffer 256k BDP=~330k Cwnd

Dips as expected

Duplicate ACKs ~15 DupACKs every lost packet

Packets Re-Transmitted One per lost packet

Throughput Mbit/s tcpdelay throughput 7.33 Gbit/s

Web100 plots of TCP parameters

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester16

Cpu0 : 6.0% us, 74.7% sy, 0.0% ni, 0.3% id, 0.0% wa, 1.3% hi, 17.7% si, 0.0% stCpu1 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si, 0.0% stCpu2 : 0.0% us, 0.0% sy, 0.0% ni, 100.0% id, 0.0% wa, 0.0% hi, 0.0% si, 0.0% stCpu3 : 100.0% us, 0.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si, 0.0% st

B2B UDP with memory access Send UDP traffic B2B with 10GE On receiver run independent

memory write task L2 Cache 4096 k Byte 8000k Byte blocks 100% user mode

Achievable UDP Throughput mean 9.39 Gb/s sigma 106 mean 9.21 Gb/s sigma 37 mean 9.2 sigma 30

Packet loss mean 0.04% mean 1.4 % mean 1.8 %

CPU load:

gig6-5_udpmon_membw

9000

9100

9200

9300

9400

9500

9600

0 10 20 30 40 50 60 70Trial number

Rec

v W

ire r

ate

Mbi

t/s

UDPUDP+cpu1UDP+cpu3

gig6-5_udpmon_membw

0

0.5

1

1.5

2

2.5

3

3.5

0 10 20 30 40 50 60 70Trial number

% P

acke

t lo

ss

UDPUDP+cpu1UDP+cpu3

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester17

ESLEA-FABRIC:4 Gbit flows over GÉANT Set up 4 Gigabit Lightpath Between GÉANT PoPs

Collaboration with Dante GÉANT Development Network London – London or London – Amsterdam

and GÉANT Lightpath service CERN – Poznan PCs in their PoPs with 10 Gigabit NICs

VLBI Tests: UDP Performance

Throughput, jitter, packet loss, 1-way delay, stability Continuous (days) Data Flows – VLBI_UDP and multi-Gigabit TCP performance with current kernels Experience for FPGA Ethernet packet systems

Dante Interests: multi-Gigabit TCP performance The effect of (Alcatel) buffer size on bursty TCP using BW limited

Lightpaths

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester18

Options Using the GÉANT Development Network

10 Gigabit SDH backbone Alkatel 1678 MCC Node location:

London Amsterdam Paris Prague Frankfurt

Can do traffic routingso make long rtt paths

Available Now 07 Less Pressure for

long term tests

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester19

Options Using the GÉANT LightPaths Set up 4 Gigabit Lightpath Between GÉANT PoPs

Collaboration with Dante PCs in Dante PoPs

10 Gigabit SDH backbone Alkatel 1678 MCC Node location:

Budapest Geneva Frankfurt Milan Paris Poznan Prague Vienna

Can do traffic routingso make long rtt paths

Ideal: London Copenhagen

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester20

Any Questions?

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester21

Backup Slides

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester22

10 Gigabit Ethernet: UDP Throughput

1500 byte MTU gives ~ 2 Gbit/s Used 16144 byte MTU max user length 16080 DataTAG Supermicro PCs Dual 2.2 GHz Xenon CPU FSB 400 MHz PCI-X mmrbc 512 bytes wire rate throughput of 2.9 Gbit/s

CERN OpenLab HP Itanium PCs Dual 1.0 GHz 64 bit Itanium CPU FSB 400 MHz PCI-X mmrbc 4096 bytes wire rate of 5.7 Gbit/s

SLAC Dell PCs giving a Dual 3.0 GHz Xenon CPU FSB 533 MHz PCI-X mmrbc 4096 bytes wire rate of 5.4 Gbit/s

an-al 10GE Xsum 512kbuf MTU16114 27Oct03

0

1000

2000

3000

4000

5000

6000

0 5 10 15 20 25 30 35 40Spacing between frames us

Rec

v W

ire

rate

Mb

its/

s

16080 bytes 14000 bytes 12000 bytes 10000 bytes 9000 bytes 8000 bytes 7000 bytes 6000 bytes 5000 bytes 4000 bytes 3000 bytes 2000 bytes 1472 bytes

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester23

10 Gigabit Ethernet: Tuning PCI-X

16080 byte packets every 200 µs Intel PRO/10GbE LR Adapter PCI-X bus occupancy vs mmrbc

Measured times Times based on PCI-X times from

the logic analyser Expected throughput ~7 Gbit/s Measured 5.7 Gbit/s

mmrbc1024 bytes

mmrbc2048 bytes

mmrbc4096 bytes5.7Gbit/s

mmrbc512 bytes

CSR Access

PCI-X Sequence

Data Transfer

Interrupt & CSR UpdateKernel 2.6.1#17 HP Itanium Intel10GE Feb04

0

2

4

6

8

10

0 1000 2000 3000 4000 5000Max Memory Read Byte Count

PC

I-X

Tra

nsfe

r tim

e

us

measured Rate Gbit/srate from expected time Gbit/s Max throughput PCI-X

DataTAG Xeon 2.2 GHz

0

2

4

6

8

10

0 1000 2000 3000 4000 5000Max Memory Read Byte Count

PC

I-X

Tra

nsfe

r tim

e

us

measured Rate Gbit/srate from expected time Gbit/s Max throughput PCI-X

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester24

10 Gigabit Ethernet: TCP Data transfer on PCI-X

Sun V20z 1.8GHz to2.6 GHz Dual Opterons

Connect via 6509 XFrame II NIC PCI-X mmrbc 4096 bytes

66 MHz

Two 9000 byte packets b2b Ave Rate 2.87 Gbit/s

Burst of packets length646.8 us

Gap between bursts 343 us 2 Interrupts / burst

CSR Access

Data Transfer

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester25

10 Gigabit Ethernet: UDP Data transfer on PCI-X Sun V20z 1.8GHz to

2.6 GHz Dual Opterons Connect via 6509 XFrame II NIC PCI-X mmrbc 2048 bytes

66 MHz One 8000 byte packets

2.8us for CSRs 24.2 us data transfer

effective rate 2.6 Gbit/s

2000 byte packet wait 0us ~200ms pauses

8000 byte packet wait 0us ~15ms between data blocks

CSR Access 2.8us

Data Transfer

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester26

10 Gigabit Ethernet: Neterion NIC Results X5DPE-G2 Supermicro PCs B2B Dual 2.2 GHz Xeon CPU FSB 533 MHz XFrame II NIC PCI-X mmrbc 4096 bytes

Low UDP rates ~2.5Gbit/s Large packet loss

TCP One iperf TCP data stream

4 Gbit/s Two bi-directional iperf TCP

data streams 3.8 & 2.2 Gbit/s

s2io 9k 3d Feb 06

0

500

1000

1500

2000

2500

3000

3500

4000

0 5 10 15 20 25 30 35 40

Spacing between frames us

Re

cv

Wir

e r

ate

Mb

it/s

1472 bytes 2000 bytes 3000 bytes 4000 bytes 5000 bytes 6000 bytes 7000 bytes 8000 bytes 8972 bytes

s2io 9k 3d Feb 06

0

10

20

30

40

5060

70

80

90

100

0 5 10 15 20 25 30 35 40Spacing between frames us

% P

acke

t lo

ss

1472 bytes 2000 bytes 3000 bytes 4000 bytes 5000 bytes 6000 bytes 7000 bytes 8000 bytes 8972 bytes

ESLEA Closing Conference, Edinburgh, March 2007, R. Hughes-Jones Manchester27

SC|05 Seattle-SLAC 10 Gigabit Ethernet 2 Lightpaths:

Routed over ESnet Layer 2 over Ultra Science Net

6 Sun V20Z systems per λ

dcache remote disk data access 100 processes per node Node sends or receives One data stream 20-30 Mbit/s

Used Neteion NICs & Chelsio TOE Data also sent to StorCloud

using fibre channel links

Traffic on the 10 GE link for 2 nodes: 3-4 Gbit per nodes 8.5-9 Gbit on Trunk

top related