import rdma: zero-copy networking with rdma and python

Download import rdma: zero-copy networking with RDMA and Python

If you can't read please download the document

Upload: groveronline

Post on 17-May-2015

2.438 views

Category:

Technology


5 download

TRANSCRIPT

  • 1. import rdma: zero-copy networking with RDMA and Python Andy Grover @groveronline http://groveronline.com http://blogs.oracle.com/linuxnstuff

2. Plan

  • Sockets, RDMA, and RDMA Sockets

3. Python and performance 4. Issues I ran into 5. (Questions anytime.) 6. Socket Example

  • client: sock.sendto(server, get data.tgz)

7. server: recvfrom() -> (get data.tgz) 8. server: data = open(data.tgz).read() 9. server: sock.sendto(client, OK + data) 10. client: recvfrom() -> data 11. Sockets

  • Sending data from server (S) to client (C), how many buffer copies are performed on S? On C?

12. 2 on S, 2 on C 13. S: read from user buffer, write to kernel buffer by OS 14. S: read from kernel buffer by HW 15. C: write to kernel buffer by HW 16. C: read from kernel buffer, write to user buffer (OS) 17. So what?

  • Socket interface is easy to use

18. Extra copy on each side consumes CPU 19. Also consumes 3x RAM bandwidth! 20. What do we do??? 21. Direct Data Placement (RDMA) 22. RDMA?

  • Target locks down user memory region and gives sender a key to reference it

23. Sender tells HW data buffer and key, Target HW uses key to place received data directly in user buffer 24. Done! 25. Way complicated 26. Aside: InfiniBand

  • Cheap and high speed

27. Supports RDMA 28. RHEL 5.4+ supports natively 29. RDMA Sockets (RDS)

  • Reliable Datagram Sockets

30. Full disclosure: my day job 31. Guaranteed delivery of datagrams 32. Allows RDMA ops via sendmsg() and CMSGs 33. Hides complexity of IB Verbs 34. Still pretty complex! 35. What is the simplest possible interface to use RDMA?

  • Let's try Python

36. Learning opportunity 37. Can it be done? 38. Pythonically? 39. Can Python do efficient networking?

  • Hell yes. Well, pretty sure

40. Interpreted, but fits in per-CPU cache 41. Many CPU cores these days 42. Shared RAM 43. Cache misses on the data 44. Implementation Issues. 45. Python strings

  • Immutable

46. Buffers shared behind the scenes 47. Solution: mmap module

  • map a file or anonymous memory

48. sliceable etc. 49. Python has no pointers

  • We need addresses of things
  • To pin it

50. To map it to the hardware Solution: C extension module using new buffer protocol added in 2.6 51. Python stdlib doesn't support sendmsg/recvmsg

  • WHAT??

52. Solution: external library, python-eunuchs 53. Native support RSN 54. Python AF_RDS support

  • Can't extend socket.socket

55. Solution: forget inheritance, just implement socket object methods 56. Implementing RdmaSocket

  • Python as much as possible

57. ctypes used heavily 58. C Module solely to return an address 59. RDMA Socket Example

  • client: m = mmap(-1, 8192)

60. client: cookie = sock.get_mr(m) 61. client: sock.sendto(server. get data.tgz, my cookie is ) 62. server: recvmsg() -> (get data.tgz, my cookie is 63. server: m = mmap(data.tgz) 64. server: sock.rdma_sendmsg(client, m, cookieval, length, token, OK) 65. client: recvmsg() -> OK 66. OK...

  • Extra overhead not worth it for small sizes

67. Copied OK instead of OK+8K, CPU and cache win 68. It worked! 69. Future investigations

  • Actual performance data

70. Dogfood it -- simplify RDS utility apps 71. RDMA loves async: go Twisted 72. Summary

  • Sysadmins
  • IB is fast, cheap networking, even without RDMA

DB cluster & Storage cluster developers

  • A new tool in the toolbox coming soon, even to Ethernet

Python coders

  • Shared-something could be good if your I/O is good enough

C coders

  • Writing a Python or xyz wrapper is straightforward, and enables a much wider pool of users

73. Thanks! http://github.com/agrover/python-rds