network file system (nfs) in high performance networkswtantisi/files/nfs-pdl08-poster.pdf · 2010....

1
Network File System (NFS) in High Performance Networks PDLSVD08-02 Overview Wittawat Tantisiriroj, Garth Gibson NFS over RDMA was recently released in February 2008 What is the value of RDMA to storage users? Competing networks General purpose network (e.g. Ethernet) High-performance network with RDMA (e.g. InfiniBand) NFS over IPoIB / NFS over RDMA IPoIB: Implemented as a standard network driver RDMA: Implemented as a new RPC type Experiment Setup For point-to-point throughput, IP over InfiniBand (Connected Mode) is comparable to a native InfiniBand Experimental Results When a disk is a bottleneck, NFS can benefit from neither IPoIB nor RMDA As the number of concurrent read operations increases, aggregate throughputs achieved for both IPoIB and RDMA significantly improve with no disadvantage for IPoIB When a disk is not a bottleneck, NFS benefits significantly from both IPoIB and RDMA RDMA is better than IPoIB by ~20% NFSv2 IP NFSv3 Open Network Computing Remote Procedure Call (ONC RPC) TCP NFSv4 RDMA CM InfiniBand/iWarp Card TCP RPC Ethernet Card Driver Ethernet Card RDMA Verbs IB CM RDMA RPC iWarp CM NFSv4.1 TCP TCP RPC IP IPoIB Driver InfiniBand Card NFS Server 1) Gigabit Ethernet Switch 2) Infiniband SDR Switch NFS Client 1-4 1) ext2 on a single 160GB SATA 2) tmpfs on 4GB RAM 0 100 200 300 400 500 600 700 800 900 1000 2 4 8 16 32 64 128 256 512 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1M 2M Throughput (MB/s) byte per send() call Point to Point Throughput with iPerf and ib_bw InniBand (Reliable Connection) – ib_bw Socket Direct Protocol (SDP) – iPerf IP over InniBand (Connected Mode) – iPerf IP over InniBand (Connected Mode) w/ 2 streams – iPerf IP over InniBand (Datagram Mode) – iPerf 1 Gigabit Ethernet – iPerf 61.54 53.02 54.40 26.92 45.96 25.20 36.76 25.33 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 write read Throughput (MB/s) Sequential Write to Server’s Disk – Read from Server’s Disk 5 GB File with 64 KB per Operation (ext2 on Single 160GB SATA) local disk eth ipoib rdma 60.26 59.52 107.93 57.39 397.76 30.67 479.25 0.00 100.00 200.00 300.00 400.00 500.00 600.00 700.00 800.00 900.00 1000.00 write read Throughput (MB/s) Sequential Write to Server’s Disk – Read from Server’s Cache 512 MB File with 64 KB per Operation (ext2 on Single 160GB SATA) local disk eth ipoib rdma 2636.51 ib_send_bw ~940 iperf(ipoib) ~650 iperf(eth) ~110 108 405 456 112 429 600 112 453 583 111 717 635 112 719 699 111 759 758 0 100 200 300 400 500 600 700 800 900 1000 eth ipoib rdma Aggregate Throughput (MB/s) Parallel Sequencial Read from Server's Cache 512 MB File with 64 KB per Operation (ext2 on Single 160GB SATA) 1 client x 1 thread 1 client x 2 threads 1 client x 4 threads 2 clients x 1 thread 2 clients x 2 threads 4 clients x 1 thread iperf(ipoib -2s) ~980 iperf(ipoib) ~650 ib_send_bw ~940 iperf(eth) ~110 Gigabit Ethernet 10 Gigabit Ethernet Infiniband 4X SDR Infiniband 4X SDR Infiniband 4X SDR 1 10 8 16 32 40 40 4 4 4 40 1,350 600 720 1,200 ~Price per NIC+Port ($) ~Latency (μs) Bandwidth (Gbps) Type Source: High-Performance Systems Integration group, Los Alamos National Laboratory (HPC-5, LANL)

Upload: others

Post on 21-Mar-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Network File System (NFS) in High Performance Networkswtantisi/files/nfs-pdl08-poster.pdf · 2010. 11. 5. · • NFS over RDMA was recently released in February 2008 ... 61.54 53.02

Network File System (NFS) in High Performance Networks

PDLSVD08-02

OverviewWittawat Tantisiriroj, Garth Gibson

• NFS over RDMA was recently released in February 2008• What is the value of RDMA to storage users?• Competing networks

• General purpose network (e.g. Ethernet)• High-performance network with RDMA (e.g. InfiniBand)

NFS over IPoIB / NFS over RDMA

• IPoIB: Implemented as a standard network driver• RDMA: Implemented as a new RPC type

Experiment Setup

• For point-to-point throughput, IP over InfiniBand (Connected Mode) is comparable to a native InfiniBand

Experimental Results

• When a disk is a bottleneck, NFS can benefit from neither IPoIB nor RMDA

• As the number of concurrent read operations increases, aggregate throughputs achieved for both IPoIB and RDMA significantly improve with no disadvantage for IPoIB

• When a disk is not a bottleneck, NFS benefits significantly from both IPoIB and RDMA

• RDMA is better than IPoIB by ~20%

NFSv2

IP

NFSv3

Open Network Computing Remote Procedure Call (ONC RPC)

TCP

NFSv4

RDMA CM

InfiniBand/iWarp Card

TCP RPC

Ethernet Card Driver

Ethernet Card

RDMA Verbs

IB CM

RDMA RPC

iWarp CM

NFSv4.1

TCP

TCP RPC

IP

IPoIB Driver

InfiniBand Card

NFS Server 1) Gigabit Ethernet Switch2) Infiniband SDR Switch

NFS Client 1-41) ext2 on a single 160GB SATA2) tmpfs on 4GB RAM

0

100

200

300

400

500

600

700

800

900

1000

2 4 8 16 32 64 128 256 512 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1M 2M

Thro

ughp

ut (M

B/s

)

byte per send() call

Point to Point Throughput with iPerf and ib_bw

InfiniBand (Reliable Connection) – ib_bw Socket Direct Protocol (SDP) – iPerf

IP over InfiniBand (Connected Mode) – iPerf IP over InfiniBand (Connected Mode) w/ 2 streams – iPerf

IP over InfiniBand (Datagram Mode) – iPerf 1 Gigabit Ethernet – iPerf

61.54

53.0254.40

26.92

45.96

25.20

36.76

25.33

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

write read

Thro

ughp

ut (M

B/s

)

Sequential Write to Server’s Disk – Read from Server’s Disk5 GB File with 64 KB per Operation (ext2 on Single 160GB SATA)

local disk eth ipoib rdma

60.26 59.52107.93

57.39

397.76

30.67

479.25

0.00

100.00

200.00

300.00

400.00

500.00

600.00

700.00

800.00

900.00

1000.00

write read

Thro

ughp

ut (M

B/s

)

Sequential Write to Server’s Disk – Read from Server’s Cache 512 MB File with 64 KB per Operation (ext2 on Single 160GB SATA)

local disk eth ipoib rdma

2636.51ib_send_bw~940

iperf(ipoib)~650

iperf(eth)~110

108

405456

112

429

600

112

453

583

111

717

635

112

719 699

111

759 758

0

100

200

300

400

500

600

700

800

900

1000

eth ipoib rdma

Agg

rega

te T

hrou

ghpu

t (M

B/s

)

Parallel Sequencial Read from Server's Cache512 MB File with 64 KB per Operation (ext2 on Single 160GB SATA)

1 client x 1 thread 1 client x 2 threads 1 client x 4 threads2 clients x 1 thread 2 clients x 2 threads 4 clients x 1 thread

iperf(ipoib-2s) ~980

iperf(ipoib) ~650

ib_send_bw ~940

iperf(eth) ~110

Gigabit Ethernet10 Gigabit EthernetInfiniband 4X SDRInfiniband 4X SDRInfiniband 4X SDR

110

81632

4040

444

401,350

600720

1,200

~Price per NIC+Port ($)

~Latency (µs)Bandwidth (Gbps)Type

Source: High-Performance Systems Integration group, Los Alamos National Laboratory (HPC-5, LANL)