204 kernel verbs api update - openfabrics alliance€¦ · leon romanovsky [ april 4th, 2016 ]...

24
12 th ANNUAL WORKSHOP 2016 KERNEL VERBS API UPDATE Leon Romanovsky [ April 4 th , 2016 ] Mellanox Technologies

Upload: others

Post on 19-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

12th ANNUAL WORKSHOP 2016

KERNEL VERBS API UPDATE Leon Romanovsky

[ April 4th, 2016 ] Mellanox Technologies

Page 2: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

OpenFabrics Alliance Workshop 2016

AGENDA

§ Memory registration API § CQ polling API § Draining QP § Generic RDMA WRITE/READ API

2  

Page 3: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

OpenFabrics Alliance Workshop 2016

MEMORY REGISTRATION API

3  

Page 4: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

OpenFabrics Alliance Workshop 2016

MULTIPLE MEMORY REGISTRATION METHODS

§ Physical memory regions (MR) •  Synchronous interface •  Every registration causes to new MR

§ Fast memory registration (FMR) •  Fast synchronous interface •  Weak deregistration semantics •  Not widely adopted

§ Memory windows (MW) •  Fast, asynchronous interface •  Binds continuous apertures to existing memory regions

•  Not relevant to kernel ULPs •  Removed from kernel API

§ Fast memory registration mode (FRWR) •  Asynchronous interface •  Maps blocks of physical memory •  Widely adopted

4  

Page 5: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

OpenFabrics Alliance Workshop 2016

MULTIPLE MEMORY REGISTRATION METHODS

§ Physical memory regions (MR) •  Synchronous interface •  Every registration causes to new MR

§ Fast memory registration (FMR) •  Fast synchronous interface •  Weak deregistration semantics •  Not widely adopted

§ Memory windows (MW) •  Fast, asynchronous interface •  Binds continuous apertures to existing memory regions

•  Not relevant to kernel ULPs •  Removed from kernel API

§ Fast memory registration mode (FRWR) •  Asynchronous interface •  Maps blocks of physical memory •  Widely adopted

5  

Page 6: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

OpenFabrics Alliance Workshop 2016

FRWR MAPPINGS

§ All buffers must be block aligned (page shift) § The first buffer can have first byte offset (FBO) § The last buffer can end before the end of block (EOB)

6  

Page 7: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

OpenFabrics Alliance Workshop 2016

FRWR DRAWBACKS

§  Allocation of free memory region and a fast_reg_page_list §  Translation from S/G list to page vector for each ULP §  No ability to support arbitrary S/G

• Each ULP bridged this semantic gap

7  

Page 8: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

OpenFabrics Alliance Workshop 2016

OLD FRWR API

8  

OP   API  

Allocate   frpl = ib_alloc_fast_reg_page_list() mr = ib_alloc_fast_reg_mr()

Free   ib_free_fast_reg_page_list(frpl) ib_dereg_mr(mr)

Register  interface  

(post_send)  

struct { u64 iova_start; struct ib_fast_reg_page_list *page_list; unsigned int page_shift; unsigned int page_list_len; u32 length; int access_flags; u32 rkey; } fast_reg;

Opcode   IB_WR_FAST_REG_MR

Page 9: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

OpenFabrics Alliance Workshop 2016

NEW FRWR API

9  

OP   API  

Allocate   mr = ib_alloc_mr(pd, max_num_sg, mr_type)

Free   ib_dereg_mr(mr)

S/G  list  mapping  

nents = ib_map_map_mr_sg(mr, sg, sg_nents, page_size);

Register  interface  

(post_send)  

struct { struct ib_send_wr wr; struct ib_mr *mr; int access; u32 key; } ib_reg_mr;

Opcode   IB_WR_REG_MR

Page 10: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

OpenFabrics Alliance Workshop 2016

ULP REGISTRATION FLOW

Allocate  MR  

Post  send  –  reg  MR  

Post  send  –  transfer  key  

Page 11: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

OpenFabrics Alliance Workshop 2016

CQ POLLING API

11  

Page 12: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

OpenFabrics Alliance Workshop 2016

CQ POLLING CONSIDERATIONS

§ Different completion contexts • Kernel threads • Work queues • Software IRQs • Hardware IRQs

§ Fairness between multiple CQs § CQ re-arm policy § Handling missed events

12  

Page 13: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

OpenFabrics Alliance Workshop 2016

CQ POLLING USAGE UNCERTAINTY

§ Work requests (WR) context return path § Work request IDs (wr_id) (un)reliable § Multiple sources of post_send completions § Polling in batches or one-by-one § Multiple CPUs (affinity) § CPU/NUMA locality

13  

Page 14: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

OpenFabrics Alliance Workshop 2016

CQ POLLING API

14  

OP   API  

Decide  on  polling  type  

enum ib_poll_context { IB_POLL_DIRECT, /* caller context, no hw completions */ IB_POLL_SOFTIRQ, /* poll from softirq context */ IB_POLL_WORKQUEUE, /* poll from workqueue */ };

Allocate   struct ib_cq *ib_alloc_cq(dev, private, nr_cqe, comp_vector, poll_ctx);

Free   void ib_free_cq(cq)

Comple@on  return  

(filled  in  WR)  

struct ib_cqe { void (*done)(struct ib_cq *cq, struct ib_wc *wc); };

Page 15: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

OpenFabrics Alliance Workshop 2016

CQ POLLING SUMMARY

§ Unify completion queue logic from different ULP clients

§ Simplify completion queue polling and interrupt handling • No need to poll, handle events and maintain logic

§ Resolve the error completions unreliability § Support different polling schemes § Performance optimized

15  

Page 16: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

OpenFabrics Alliance Workshop 2016

DRAINING QP

16  

Page 17: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

OpenFabrics Alliance Workshop 2016

PROBLEM

§ Unknown when WRs are completed after ceasing posts

WR  

WR  

CE  

QP2  QP1  

CQ  

Page 18: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

OpenFabrics Alliance Workshop 2016

COMMON METHODS

1.  Wait until all previously posted WRs complete and then destroy QP - may be indefinite

2.  Destroy QP without waiting for completions - need to maintain a shadow state of all in-flight WQEs in order to free related state

3.  Modify QP to error, and then poll related CQ until it is empty - works only if CQ is associated with only the same QP

Page 19: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

OpenFabrics Alliance Workshop 2016

ROBUST GENERIC DRAINING

1.  Cease posting new WRs to QP 2.  Change QP state to ERR 3.  Post marker WR (nop) 4.  Wait until marker WR completes

19  

WR  

WR  

Marker  WR  

WR  

WR  

CE  

QP2  QP1  

CQ  

Page 20: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

OpenFabrics Alliance Workshop 2016

DRAIN QP API

§ IB/core code to drain SQ/RQ/both in single function call

§ Protect from use-after-free error flow § Synchronized operation in one place

20  

OP   API  

Drain  SQ   void ib_drain_sq(qp)

Drain  RQ   void ib_drain_rq(p)

Drain  QP   void ib_drain_qp(qp)

Page 21: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

OpenFabrics Alliance Workshop 2016

GENERIC RDMA READ/WRITE (WIP)

21  

Page 22: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

OpenFabrics Alliance Workshop 2016

MOTIVATION

§ RDMA READ/WRITE is common operation for storage protocols § Every ULP has own implementation § Lack of support for generic S/G lists § HCA aware implementations § Reuse pre-allocated MRs (MR pool) § Support large number of S/G entries

22  

Page 23: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

OpenFabrics Alliance Workshop 2016

REFERENCES

§  IB: new common API for draining queues by Steve Wise § Generic RDMA READ/WRITE API by Christoph Hellwig §  Completion queue abstraction by Christoph Hellwig and Sagi

Grimberg §  New fast registration API by Sagi Grimberg

23  

Page 24: 204 Kernel Verbs API Update - OpenFabrics Alliance€¦ · Leon Romanovsky [ April 4th, 2016 ] Mellanox Technologies . OpenFabrics Alliance Workshop 2016 AGENDA ! Memory registration

12th ANNUAL WORKSHOP 2016

THANK YOU Leon Romanovsky

Mellanox Technology

[ LOGO HERE ]