sonoma feb 6, 2006 reliable datagram sockets (rds) ranjit pandit silverstorm technologies...
TRANSCRIPT
![Page 1: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com](https://reader036.vdocuments.us/reader036/viewer/2022083006/56649f305503460f94c4bbf7/html5/thumbnails/1.jpg)
Sonoma Feb 6, 2006
Reliable Datagram Sockets(RDS)
Ranjit Pandit
SilverStorm Technologies
![Page 2: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com](https://reader036.vdocuments.us/reader036/viewer/2022083006/56649f305503460f94c4bbf7/html5/thumbnails/2.jpg)
Sonoma Feb 6, 2006
Page 2
Agenda
• Goals• High Level Design• Current status• Preliminary performance data• Future work
![Page 3: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com](https://reader036.vdocuments.us/reader036/viewer/2022083006/56649f305503460f94c4bbf7/html5/thumbnails/3.jpg)
Sonoma Feb 6, 2006
Page 3
Goals
• Provide reliable datagram service – performance– scalability– high availability– simplify application code
• Maintain sockets API– application code portability– faster time-to-market
Keep It Simple !!!
![Page 4: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com](https://reader036.vdocuments.us/reader036/viewer/2022083006/56649f305503460f94c4bbf7/html5/thumbnails/4.jpg)
Sonoma Feb 6, 2006
Page 4
Stack Overview
Host Channel Adapter
Openib Access Layer
IPoIB
IP
Oracle 10g
SocketApplications
TCP UDP SDP RDS
Kernel
User UDP
Applications
![Page 5: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com](https://reader036.vdocuments.us/reader036/viewer/2022083006/56649f305503460f94c4bbf7/html5/thumbnails/5.jpg)
Sonoma Feb 6, 2006
Page 5
High Level Design
• RDS registers with the kernel as driver for Address Family PF_INET_OFFLOAD and Type SOCK_DGRAM
• Application creates a RDS socket with socket(2)– arg1 = PF = PF_INET_OFFLOAD – arg 2 = Type = SOCK_DGRAM
• socket(2) API supported– socket, bind, ioctl, sendmsg, recvmsg, poll, getsockopt/setsockopt
![Page 6: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com](https://reader036.vdocuments.us/reader036/viewer/2022083006/56649f305503460f94c4bbf7/html5/thumbnails/6.jpg)
Sonoma Feb 6, 2006
Page 6
Connection model
• Application connectionless
• Rds maintains node-to-node connection• IP addressing• Uses CMA• on-demand connection setup
– connect on first sendmsg()or data recv– disconnect on error or policy like inactivity
• Connection setup/teardown transparent to applicationsApplication connectionless
![Page 7: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com](https://reader036.vdocuments.us/reader036/viewer/2022083006/56649f305503460f94c4bbf7/html5/thumbnails/7.jpg)
Sonoma Feb 6, 2006
Page 7
Data and Control Channel
• Uses RC QP for node level connections• Data and Control QPs per session• Selectable MTU• b-copy send/recv• h/w flow control
![Page 8: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com](https://reader036.vdocuments.us/reader036/viewer/2022083006/56649f305503460f94c4bbf7/html5/thumbnails/8.jpg)
Sonoma Feb 6, 2006
Page 8
P2
Kernel
User
Node 1
P1
sendmsg(node2)
… Pn
Node 2
RC QP RC QP
s1 s2 sn
P1
S1
recvmsg()
Rds Rds
![Page 9: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com](https://reader036.vdocuments.us/reader036/viewer/2022083006/56649f305503460f94c4bbf7/html5/thumbnails/9.jpg)
Sonoma Feb 6, 2006
Page 9
Send
• Connection established on first send
• sendmsg()– allows send pipelining
• ENOBUF returned if insufficient send buffers, application retries
![Page 10: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com](https://reader036.vdocuments.us/reader036/viewer/2022083006/56649f305503460f94c4bbf7/html5/thumbnails/10.jpg)
Sonoma Feb 6, 2006
Page 10
Receive
• Identical to UDP recvmsg()– similar blocking/non-blocking behavior
• “Slow” receiver ports are stalled at sender side– combination of activity (LRU) and memory utilization
used to detect slow receivers– sendmsg() to stalled destination port returns
EWOULDBLOCK, application can retry• Blocking socket can wait for unblock
– recvmsg() on a stalled port un-stalls it
![Page 11: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com](https://reader036.vdocuments.us/reader036/viewer/2022083006/56649f305503460f94c4bbf7/html5/thumbnails/11.jpg)
Sonoma Feb 6, 2006
Page 11
High Availability (failover)
• Use of RC and on-demand connection setup allows HA– connection setup/teardown transparent to applications– every sendmsg() could “potentially” result in a
connection setup– if a path fails, connection is torn down, next send can
connect on an alternate path (different port or different HCA)
![Page 12: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com](https://reader036.vdocuments.us/reader036/viewer/2022083006/56649f305503460f94c4bbf7/html5/thumbnails/12.jpg)
Sonoma Feb 6, 2006
Page 12
Preliminary performance Rds on Openib
netperf (UDP_STREAM)
0
500
1000
1500
2000
2500
3000
3500
4000
2k 4k 8k 16k 32K 64K
msg size (bytes)
Mb
its/
sec UDP GbE
UDP ipoib send
UDP ipoib recv
Rds (send = recv)*Dual 2.4GHz Xeon2G memory4x PCI-X HCA
**Sdp ~3700Mb/secTCP_STREAM
![Page 13: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com](https://reader036.vdocuments.us/reader036/viewer/2022083006/56649f305503460f94c4bbf7/html5/thumbnails/13.jpg)
Sonoma Feb 6, 2006
Page 13
Preliminary performance Rds on OpenIB
netperf (UDP_STREAM)
0
500
1000
1500
2000
2500
3000
3500
4000
2k 4k 8k 16k 32K 64K
msg size (bytes)
Mb
its/
sec UDP GbE
UDP ipoib recv
Rds (send = recv)
*Dual 2.4GHz Xeon2G memory4x PCI-X HCA
**Sdp ~3700Mb/secTCP_STREAM
![Page 14: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com](https://reader036.vdocuments.us/reader036/viewer/2022083006/56649f305503460f94c4bbf7/html5/thumbnails/14.jpg)
Sonoma Feb 6, 2006
Page 14
Preliminary performance Rds on OpenIB
Latency
0
50
100
150
200
250
300
350
400
450
500
4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
4
3276
8
Msg size(bytes)
use
c
UDP GigE
UDP ipoib
Rds
![Page 15: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com](https://reader036.vdocuments.us/reader036/viewer/2022083006/56649f305503460f94c4bbf7/html5/thumbnails/15.jpg)
Sonoma Feb 6, 2006
Page 15
Status in OpenIB
• Z-copy• Functionally 98% complete
• Running Netperf• Running Oracle unit test (crload) stable today
• Code checked into contrib/silverstorm/ https://openib.org/svn/trunk/contrib/silverstorm/rds/
![Page 16: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com](https://reader036.vdocuments.us/reader036/viewer/2022083006/56649f305503460f94c4bbf7/html5/thumbnails/16.jpg)
Sonoma Feb 6, 2006
Page 16
Future
• AIO
• Z-copy
• Shared recv queue
![Page 17: Sonoma Feb 6, 2006 Reliable Datagram Sockets (RDS) Ranjit Pandit SilverStorm Technologies rpandit@silverstorm.com](https://reader036.vdocuments.us/reader036/viewer/2022083006/56649f305503460f94c4bbf7/html5/thumbnails/17.jpg)
Sonoma Feb 6, 2006
Page 17
Q&A