sonoma feb 6, 2006 reliable datagram sockets (rds) ranjit pandit silverstorm technologies...

17
Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies [email protected]

Upload: philippa-wilcox

Post on 05-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com

Sonoma Feb 6, 2006

Reliable Datagram Sockets(RDS)

Ranjit Pandit

SilverStorm Technologies

[email protected]

Page 2: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com

Sonoma Feb 6, 2006

Page 2

Agenda

• Goals• High Level Design• Current status• Preliminary performance data• Future work

Page 3: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com

Sonoma Feb 6, 2006

Page 3

Goals

• Provide reliable datagram service – performance– scalability– high availability– simplify application code

• Maintain sockets API– application code portability– faster time-to-market

Keep It Simple !!!

Page 4: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com

Sonoma Feb 6, 2006

Page 4

Stack Overview

Host Channel Adapter

Openib Access Layer

IPoIB

IP

Oracle 10g

SocketApplications

TCP UDP SDP RDS

Kernel

User UDP

Applications

Page 5: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com

Sonoma Feb 6, 2006

Page 5

High Level Design

• RDS registers with the kernel as driver for Address Family PF_INET_OFFLOAD and Type SOCK_DGRAM

• Application creates a RDS socket with socket(2)– arg1 = PF = PF_INET_OFFLOAD – arg 2 = Type = SOCK_DGRAM

• socket(2) API supported– socket, bind, ioctl, sendmsg, recvmsg, poll, getsockopt/setsockopt

Page 6: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com

Sonoma Feb 6, 2006

Page 6

Connection model

• Application connectionless

• Rds maintains node-to-node connection• IP addressing• Uses CMA• on-demand connection setup

– connect on first sendmsg()or data recv– disconnect on error or policy like inactivity

• Connection setup/teardown transparent to applicationsApplication connectionless

Page 7: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com

Sonoma Feb 6, 2006

Page 7

Data and Control Channel

• Uses RC QP for node level connections• Data and Control QPs per session• Selectable MTU• b-copy send/recv• h/w flow control

Page 8: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com

Sonoma Feb 6, 2006

Page 8

P2

Kernel

User

Node 1

P1

sendmsg(node2)

… Pn

Node 2

RC QP RC QP

s1 s2 sn

P1

S1

recvmsg()

Rds Rds

Page 9: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com

Sonoma Feb 6, 2006

Page 9

Send

• Connection established on first send

• sendmsg()– allows send pipelining

• ENOBUF returned if insufficient send buffers, application retries

Page 10: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com

Sonoma Feb 6, 2006

Page 10

Receive

• Identical to UDP recvmsg()– similar blocking/non-blocking behavior

• “Slow” receiver ports are stalled at sender side– combination of activity (LRU) and memory utilization

used to detect slow receivers– sendmsg() to stalled destination port returns

EWOULDBLOCK, application can retry• Blocking socket can wait for unblock

– recvmsg() on a stalled port un-stalls it

Page 11: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com

Sonoma Feb 6, 2006

Page 11

High Availability (failover)

• Use of RC and on-demand connection setup allows HA– connection setup/teardown transparent to applications– every sendmsg() could “potentially” result in a

connection setup– if a path fails, connection is torn down, next send can

connect on an alternate path (different port or different HCA)

Page 12: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com

Sonoma Feb 6, 2006

Page 12

Preliminary performance Rds on Openib

netperf (UDP_STREAM)

0

500

1000

1500

2000

2500

3000

3500

4000

2k 4k 8k 16k 32K 64K

msg size (bytes)

Mb

its/

sec UDP GbE

UDP ipoib send

UDP ipoib recv

Rds (send = recv)*Dual 2.4GHz Xeon2G memory4x PCI-X HCA

**Sdp ~3700Mb/secTCP_STREAM

Page 13: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com

Sonoma Feb 6, 2006

Page 13

Preliminary performance Rds on OpenIB

netperf (UDP_STREAM)

0

500

1000

1500

2000

2500

3000

3500

4000

2k 4k 8k 16k 32K 64K

msg size (bytes)

Mb

its/

sec UDP GbE

UDP ipoib recv

Rds (send = recv)

*Dual 2.4GHz Xeon2G memory4x PCI-X HCA

**Sdp ~3700Mb/secTCP_STREAM

Page 14: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com

Sonoma Feb 6, 2006

Page 14

Preliminary performance Rds on OpenIB

Latency

0

50

100

150

200

250

300

350

400

450

500

4 8 16 32 64 128

256

512

1024

2048

4096

8192

1638

4

3276

8

Msg size(bytes)

use

c

UDP GigE

UDP ipoib

Rds

Page 15: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com

Sonoma Feb 6, 2006

Page 15

Status in OpenIB

• Z-copy• Functionally 98% complete

• Running Netperf• Running Oracle unit test (crload) stable today

• Code checked into contrib/silverstorm/ https://openib.org/svn/trunk/contrib/silverstorm/rds/

Page 16: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com

Sonoma Feb 6, 2006

Page 16

Future

• AIO

• Z-copy

• Shared recv queue

Page 17: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com

Sonoma Feb 6, 2006

Page 17

Q&A