import rdma: zero-copy networking with rdma and python
TRANSCRIPT
- 1. import rdma: zero-copy networking with RDMA and Python Andy Grover @groveronline http://groveronline.com http://blogs.oracle.com/linuxnstuff
2. Plan
- Sockets, RDMA, and RDMA Sockets
3. Python and performance 4. Issues I ran into 5. (Questions anytime.) 6. Socket Example
- client: sock.sendto(server, get data.tgz)
7. server: recvfrom() -> (get data.tgz) 8. server: data = open(data.tgz).read() 9. server: sock.sendto(client, OK + data) 10. client: recvfrom() -> data 11. Sockets
- Sending data from server (S) to client (C), how many buffer copies are performed on S? On C?
12. 2 on S, 2 on C 13. S: read from user buffer, write to kernel buffer by OS 14. S: read from kernel buffer by HW 15. C: write to kernel buffer by HW 16. C: read from kernel buffer, write to user buffer (OS) 17. So what?
- Socket interface is easy to use
18. Extra copy on each side consumes CPU 19. Also consumes 3x RAM bandwidth! 20. What do we do??? 21. Direct Data Placement (RDMA) 22. RDMA?
- Target locks down user memory region and gives sender a key to reference it
23. Sender tells HW data buffer and key, Target HW uses key to place received data directly in user buffer 24. Done! 25. Way complicated 26. Aside: InfiniBand
- Cheap and high speed
27. Supports RDMA 28. RHEL 5.4+ supports natively 29. RDMA Sockets (RDS)
- Reliable Datagram Sockets
30. Full disclosure: my day job 31. Guaranteed delivery of datagrams 32. Allows RDMA ops via sendmsg() and CMSGs 33. Hides complexity of IB Verbs 34. Still pretty complex! 35. What is the simplest possible interface to use RDMA?
- Let's try Python
36. Learning opportunity 37. Can it be done? 38. Pythonically? 39. Can Python do efficient networking?
- Hell yes. Well, pretty sure
40. Interpreted, but fits in per-CPU cache 41. Many CPU cores these days 42. Shared RAM 43. Cache misses on the data 44. Implementation Issues. 45. Python strings
- Immutable
46. Buffers shared behind the scenes 47. Solution: mmap module
- map a file or anonymous memory
48. sliceable etc. 49. Python has no pointers
- We need addresses of things
- To pin it
50. To map it to the hardware Solution: C extension module using new buffer protocol added in 2.6 51. Python stdlib doesn't support sendmsg/recvmsg
- WHAT??
52. Solution: external library, python-eunuchs 53. Native support RSN 54. Python AF_RDS support
- Can't extend socket.socket
55. Solution: forget inheritance, just implement socket object methods 56. Implementing RdmaSocket
- Python as much as possible
57. ctypes used heavily 58. C Module solely to return an address 59. RDMA Socket Example
- client: m = mmap(-1, 8192)
60. client: cookie = sock.get_mr(m) 61. client: sock.sendto(server. get data.tgz, my cookie is ) 62. server: recvmsg() -> (get data.tgz, my cookie is 63. server: m = mmap(data.tgz) 64. server: sock.rdma_sendmsg(client, m, cookieval, length, token, OK) 65. client: recvmsg() -> OK 66. OK...
- Extra overhead not worth it for small sizes
67. Copied OK instead of OK+8K, CPU and cache win 68. It worked! 69. Future investigations
- Actual performance data
70. Dogfood it -- simplify RDS utility apps 71. RDMA loves async: go Twisted 72. Summary
- Sysadmins
- IB is fast, cheap networking, even without RDMA
DB cluster & Storage cluster developers
- A new tool in the toolbox coming soon, even to Ethernet
Python coders
- Shared-something could be good if your I/O is good enough
C coders
- Writing a Python or xyz wrapper is straightforward, and enables a much wider pool of users
73. Thanks! http://github.com/agrover/python-rds