sonoma feb 6, 2006 reliable datagram sockets (rds) ranjit pandit silverstorm technologies...
Post on 05-Jan-2016
212 Views
Preview:
TRANSCRIPT
Sonoma Feb 6, 2006
Reliable Datagram Sockets(RDS)
Ranjit Pandit
SilverStorm Technologies
rpandit@silverstorm.com
Sonoma Feb 6, 2006
Page 2
Agenda
• Goals• High Level Design• Current status• Preliminary performance data• Future work
Sonoma Feb 6, 2006
Page 3
Goals
• Provide reliable datagram service – performance– scalability– high availability– simplify application code
• Maintain sockets API– application code portability– faster time-to-market
Keep It Simple !!!
Sonoma Feb 6, 2006
Page 4
Stack Overview
Host Channel Adapter
Openib Access Layer
IPoIB
IP
Oracle 10g
SocketApplications
TCP UDP SDP RDS
Kernel
User UDP
Applications
Sonoma Feb 6, 2006
Page 5
High Level Design
• RDS registers with the kernel as driver for Address Family PF_INET_OFFLOAD and Type SOCK_DGRAM
• Application creates a RDS socket with socket(2)– arg1 = PF = PF_INET_OFFLOAD – arg 2 = Type = SOCK_DGRAM
• socket(2) API supported– socket, bind, ioctl, sendmsg, recvmsg, poll, getsockopt/setsockopt
Sonoma Feb 6, 2006
Page 6
Connection model
• Application connectionless
• Rds maintains node-to-node connection• IP addressing• Uses CMA• on-demand connection setup
– connect on first sendmsg()or data recv– disconnect on error or policy like inactivity
• Connection setup/teardown transparent to applicationsApplication connectionless
Sonoma Feb 6, 2006
Page 7
Data and Control Channel
• Uses RC QP for node level connections• Data and Control QPs per session• Selectable MTU• b-copy send/recv• h/w flow control
Sonoma Feb 6, 2006
Page 8
P2
Kernel
User
Node 1
P1
sendmsg(node2)
… Pn
Node 2
RC QP RC QP
s1 s2 sn
P1
S1
recvmsg()
Rds Rds
Sonoma Feb 6, 2006
Page 9
Send
• Connection established on first send
• sendmsg()– allows send pipelining
• ENOBUF returned if insufficient send buffers, application retries
Sonoma Feb 6, 2006
Page 10
Receive
• Identical to UDP recvmsg()– similar blocking/non-blocking behavior
• “Slow” receiver ports are stalled at sender side– combination of activity (LRU) and memory utilization
used to detect slow receivers– sendmsg() to stalled destination port returns
EWOULDBLOCK, application can retry• Blocking socket can wait for unblock
– recvmsg() on a stalled port un-stalls it
Sonoma Feb 6, 2006
Page 11
High Availability (failover)
• Use of RC and on-demand connection setup allows HA– connection setup/teardown transparent to applications– every sendmsg() could “potentially” result in a
connection setup– if a path fails, connection is torn down, next send can
connect on an alternate path (different port or different HCA)
Sonoma Feb 6, 2006
Page 12
Preliminary performance Rds on Openib
netperf (UDP_STREAM)
0
500
1000
1500
2000
2500
3000
3500
4000
2k 4k 8k 16k 32K 64K
msg size (bytes)
Mb
its/
sec UDP GbE
UDP ipoib send
UDP ipoib recv
Rds (send = recv)*Dual 2.4GHz Xeon2G memory4x PCI-X HCA
**Sdp ~3700Mb/secTCP_STREAM
Sonoma Feb 6, 2006
Page 13
Preliminary performance Rds on OpenIB
netperf (UDP_STREAM)
0
500
1000
1500
2000
2500
3000
3500
4000
2k 4k 8k 16k 32K 64K
msg size (bytes)
Mb
its/
sec UDP GbE
UDP ipoib recv
Rds (send = recv)
*Dual 2.4GHz Xeon2G memory4x PCI-X HCA
**Sdp ~3700Mb/secTCP_STREAM
Sonoma Feb 6, 2006
Page 14
Preliminary performance Rds on OpenIB
Latency
0
50
100
150
200
250
300
350
400
450
500
4 8 16 32 64 128
256
512
1024
2048
4096
8192
1638
4
3276
8
Msg size(bytes)
use
c
UDP GigE
UDP ipoib
Rds
Sonoma Feb 6, 2006
Page 15
Status in OpenIB
• Z-copy• Functionally 98% complete
• Running Netperf• Running Oracle unit test (crload) stable today
• Code checked into contrib/silverstorm/ https://openib.org/svn/trunk/contrib/silverstorm/rds/
Sonoma Feb 6, 2006
Page 16
Future
• AIO
• Z-copy
• Shared recv queue
Sonoma Feb 6, 2006
Page 17
Q&A
top related