network file system (nfs) in high performance networkswtantisi/files/nfs-pdl08-poster.pdf · 2010....

Post on 21-Mar-2021

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Network File System (NFS) in High Performance Networks

PDLSVD08-02

OverviewWittawat Tantisiriroj, Garth Gibson

• NFS over RDMA was recently released in February 2008• What is the value of RDMA to storage users?• Competing networks

• General purpose network (e.g. Ethernet)• High-performance network with RDMA (e.g. InfiniBand)

NFS over IPoIB / NFS over RDMA

• IPoIB: Implemented as a standard network driver• RDMA: Implemented as a new RPC type

Experiment Setup

• For point-to-point throughput, IP over InfiniBand (Connected Mode) is comparable to a native InfiniBand

Experimental Results

• When a disk is a bottleneck, NFS can benefit from neither IPoIB nor RMDA

• As the number of concurrent read operations increases, aggregate throughputs achieved for both IPoIB and RDMA significantly improve with no disadvantage for IPoIB

• When a disk is not a bottleneck, NFS benefits significantly from both IPoIB and RDMA

• RDMA is better than IPoIB by ~20%

NFSv2

IP

NFSv3

Open Network Computing Remote Procedure Call (ONC RPC)

TCP

NFSv4

RDMA CM

InfiniBand/iWarp Card

TCP RPC

Ethernet Card Driver

Ethernet Card

RDMA Verbs

IB CM

RDMA RPC

iWarp CM

NFSv4.1

TCP

TCP RPC

IP

IPoIB Driver

InfiniBand Card

NFS Server 1) Gigabit Ethernet Switch2) Infiniband SDR Switch

NFS Client 1-41) ext2 on a single 160GB SATA2) tmpfs on 4GB RAM

0

100

200

300

400

500

600

700

800

900

1000

2 4 8 16 32 64 128 256 512 1K 2K 4K 8K 16K 32K 64K 128K 256K 512K 1M 2M

Thro

ughp

ut (M

B/s

)

byte per send() call

Point to Point Throughput with iPerf and ib_bw

InfiniBand (Reliable Connection) – ib_bw Socket Direct Protocol (SDP) – iPerf

IP over InfiniBand (Connected Mode) – iPerf IP over InfiniBand (Connected Mode) w/ 2 streams – iPerf

IP over InfiniBand (Datagram Mode) – iPerf 1 Gigabit Ethernet – iPerf

61.54

53.0254.40

26.92

45.96

25.20

36.76

25.33

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

80.00

write read

Thro

ughp

ut (M

B/s

)

Sequential Write to Server’s Disk – Read from Server’s Disk5 GB File with 64 KB per Operation (ext2 on Single 160GB SATA)

local disk eth ipoib rdma

60.26 59.52107.93

57.39

397.76

30.67

479.25

0.00

100.00

200.00

300.00

400.00

500.00

600.00

700.00

800.00

900.00

1000.00

write read

Thro

ughp

ut (M

B/s

)

Sequential Write to Server’s Disk – Read from Server’s Cache 512 MB File with 64 KB per Operation (ext2 on Single 160GB SATA)

local disk eth ipoib rdma

2636.51ib_send_bw~940

iperf(ipoib)~650

iperf(eth)~110

108

405456

112

429

600

112

453

583

111

717

635

112

719 699

111

759 758

0

100

200

300

400

500

600

700

800

900

1000

eth ipoib rdma

Agg

rega

te T

hrou

ghpu

t (M

B/s

)

Parallel Sequencial Read from Server's Cache512 MB File with 64 KB per Operation (ext2 on Single 160GB SATA)

1 client x 1 thread 1 client x 2 threads 1 client x 4 threads2 clients x 1 thread 2 clients x 2 threads 4 clients x 1 thread

iperf(ipoib-2s) ~980

iperf(ipoib) ~650

ib_send_bw ~940

iperf(eth) ~110

Gigabit Ethernet10 Gigabit EthernetInfiniband 4X SDRInfiniband 4X SDRInfiniband 4X SDR

110

81632

4040

444

401,350

600720

1,200

~Price per NIC+Port ($)

~Latency (µs)Bandwidth (Gbps)Type

Source: High-Performance Systems Integration group, Los Alamos National Laboratory (HPC-5, LANL)

top related