enabling nvme wrr support in linux block layer · rajesh sahoo, anshul sharma, sungyoung ahn, manoj...
TRANSCRIPT
![Page 1: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/1.jpg)
Enabling NVMe WRR support in Linux Block Layer
USENIX HotStorage’17
Kanchan Joshi, Praval Choudhary, Kaushal YadavMemory solutions, Samsung Semiconductor India R&D
![Page 2: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/2.jpg)
Outline
NVMe I/O queues
Arbitration methods and WRR
What it takes to build differentiated I/O service
Affinity based method and its drawback
Proposed method
Results
Summary
![Page 3: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/3.jpg)
NVMe I/O QueuesHOST IO Queues NVMe SSD
![Page 4: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/4.jpg)
NVMe I/O QueuesHOST IO Queues NVMe SSD
Per-CPU queue pair Parallel I/O distribution Fast core-local path
![Page 5: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/5.jpg)
Arbitration Methods
Arbitrate
Round-Robin (RR)
Controller
![Page 6: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/6.jpg)
Arbitration Methods
Arbitrate
Weight 3
Weight 2
Weight 1
Medium
High
Low
Arbitrate
Round-Robin (RR)
Weighted Round-Robin with urgent priority (WRR)
Controller
Controller
![Page 7: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/7.jpg)
Problem Statement
How to make prioritization capability (WRR) benefits reach to Applications!
![Page 8: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/8.jpg)
WRR Support Requirements
I/O Prioritization
Need to create prioritized I/O queues
Retain NUMA-friendly path
I/O classification
How application can specify I/O service?
Per-application or per I/O?
![Page 9: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/9.jpg)
WRR Support Requirements
SQ
SQ
SQ
SQ
Non-prioritized queues
SQ
SQ
SQ
Prioritized queues
SQ
URGENT HIGH
MEDIUM LOW
I/O Prioritization
Need to create prioritized I/O queues
Retain NUMA-friendly path
I/O classification
How application can specify I/O service?
Per-application or per I/O?
![Page 10: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/10.jpg)
WRR Support Requirements
SQ
SQ
SQ
SQ
Non-prioritized queues
SQ
SQ
SQ
Prioritized queues
SQ
URGENT HIGH
MEDIUM LOW
APP1
APP2
APP3
APP4
IO classification method
SQ
SQ
SQ
SQ
I/O Prioritization
Need to create prioritized I/O queues
Retain NUMA-friendly path
I/O classification
How application can specify I/O service?
Per-application or per I/O?
![Page 11: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/11.jpg)
Affinity-based Method
Prioritization method: Each core hosts one type of submission queue (1:1 mapping)
Classification method: Affine applications to particular core(s)
CORE 3
SQ CQ
L
O
W
CORE 2
SQ CQ
M
E
D
I
U
M
CORE 1
SQ CQ
H
I
G
H
CORE 0
SQ CQ
U
R
G
E
N
T
NVMe Controller
![Page 12: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/12.jpg)
Affinity-based Method
Prioritization method: Each core hosts one type of submission queue (1:1 mapping)
Classification method: Affine applications to particular core(s)
CORE 3
SQ CQ
L
O
W
CORE 2
SQ CQ
M
E
D
I
U
M
CORE 1
SQ CQ
H
I
G
H
CORE 0
SQ CQ
U
R
G
E
N
T
NVMe Controller
Affine Affine Affine Affine
URGENT HIGH MEDIUM LOW
![Page 13: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/13.jpg)
Drawbacks
All running applications must be affined (Arbitrary I/O performance otherwise)
![Page 14: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/14.jpg)
Drawbacks
All running applications must be affined (Arbitrary I/O performance otherwise)
C1
HIGH PRIORITY
C3
LOW PRIORITY
C2
MEDIUM PRIORITY
Reduction in compute-ability
Mandatory affinity leading to asymmetric core-utilization
![Page 15: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/15.jpg)
Proposed Method: I/O Priority-based
I/O Prioritization
Create prioritized I/O queues on each core
Retain NUMA-friendly path
I/O Classification
Link NVMe priorities to existing I/O priority classes
Per-application
![Page 16: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/16.jpg)
Proposed Method: I/O Priority-based
I/O Prioritization
Create prioritized I/O queues on each core
Retain NUMA-friendly path
I/O Classification
Link NVMe priorities to existing I/O priority classes
Per-application
CORE 0
SQ CQ
U
R
G
E
N
T
CORE 0
SQ SQ SQ SQ CQ
U
R
G
E
N
T
H
I
G
H
M
E
D
I
U
M
L
O
W
![Page 17: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/17.jpg)
Proposed Method: I/O Priority-based
APP1
APP2
APP3
APP4
IO scheduling class
NVMe queue priority
SQ
SQ
SQ
SQ
I/O Prioritization
Create prioritized I/O queues on each core
Retain NUMA-friendly path
I/O Classification
Link NVMe priorities to existing I/O priority classes
Per-application
CORE 0
SQ CQ
U
R
G
E
N
T
CORE 0
SQ SQ SQ SQ CQ
U
R
G
E
N
T
H
I
G
H
M
E
D
I
U
M
L
O
W
![Page 18: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/18.jpg)
I/O Priority-based Method
Prioritization Method: Each core hosts four type of submission queues (4:1 mapping) Classification Method: Reuse existing I/O scheduling classes
CORE 0
SQ SQ SQ SQ CQ
U
R
G
E
N
T
H
I
G
H
M
E
D
I
U
M
L
O
W
CORE 1
SQ SQ SQ SQ CQ
U
R
G
E
N
T
H
I
G
H
M
E
D
I
U
M
L
O
W
CORE 2
SQ SQ SQ SQ CQ
U
R
G
E
N
T
H
I
G
H
M
E
D
I
U
M
L
O
W
CORE 3
SQ SQ SQ SQ CQ
U
R
G
E
N
T
H
I
G
H
M
E
D
I
U
M
L
O
W
NVMe Controller
Compute-ability unaffected Does not require modifying applications
![Page 19: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/19.jpg)
I/O Priority-based Method
Prioritization Method: Each core hosts four type of submission queues (4:1 mapping) Classification Method: Reuse existing I/O scheduling classes
CORE 0
SQ SQ SQ SQ CQ
U
R
G
E
N
T
H
I
G
H
M
E
D
I
U
M
L
O
W
CORE 1
SQ SQ SQ SQ CQ
U
R
G
E
N
T
H
I
G
H
M
E
D
I
U
M
L
O
W
CORE 2
SQ SQ SQ SQ CQ
U
R
G
E
N
T
H
I
G
H
M
E
D
I
U
M
L
O
W
CORE 3
SQ SQ SQ SQ CQ
U
R
G
E
N
T
H
I
G
H
M
E
D
I
U
M
L
O
W
NVMe Controller
Real-time
Compute-ability unaffected Does not require modifying applications
![Page 20: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/20.jpg)
I/O Priority-based Method
Prioritization Method: Each core hosts four type of submission queues (4:1 mapping) Classification Method: Reuse existing I/O scheduling classes
CORE 0
SQ SQ SQ SQ CQ
U
R
G
E
N
T
H
I
G
H
M
E
D
I
U
M
L
O
W
CORE 1
SQ SQ SQ SQ CQ
U
R
G
E
N
T
H
I
G
H
M
E
D
I
U
M
L
O
W
CORE 2
SQ SQ SQ SQ CQ
U
R
G
E
N
T
H
I
G
H
M
E
D
I
U
M
L
O
W
CORE 3
SQ SQ SQ SQ CQ
U
R
G
E
N
T
H
I
G
H
M
E
D
I
U
M
L
O
W
NVMe Controller
Real-time Best-effort None Idle
Compute-ability unaffected Does not require modifying applications
![Page 21: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/21.jpg)
Modified NVMe Stack (4.10 Kernel)
VFS/Page cache
Single-queue Multi-queue
deadline
CFQ
NVMe driver(Modified)
SATA driver
Block Layer
Specify io-priority class value while running (ionice)
This is stored in io_contextinside task_struct
Obtain io-class value from io_context (or from request)
Map io-class to queue-priority value and place command in corresponding SQ
Real-time Urgent
Best-effort
None
Idle
High
Medium
Low
![Page 22: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/22.jpg)
Ionice example on NVMe
Best-effort
210K
High
Idle
75.8K
Low
None
143K
Medium
![Page 23: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/23.jpg)
Experimental Setup
Linux 4.10 Kernel (Modified NVMe Driver)
Dell R720 server 32 CPUs (2 NUMA nodes) 32 GB RAM
Samsung PM1725a SSD (With WRR arbitration)
![Page 24: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/24.jpg)
Result #1
IOPS distribution among 3 applications
Application configuration 4 FIO jobs QD 64 4K record
![Page 25: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/25.jpg)
Result #1
IOPS distribution among 3 applications
Application configuration 4 FIO jobs QD 64 4K record
Weight-based
distribution
![Page 26: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/26.jpg)
Result #1
IOPS distribution among 3 applications
Application configuration 4 FIO jobs QD 64 4K record
Weight-based
distribution
420 420 423 419
![Page 27: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/27.jpg)
Result #2
Bandwidth distribution among 3 applications
Application configuration 4 FIO jobs QD 64 128K record
![Page 28: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/28.jpg)
Result #2
Bandwidth distribution among 3 applications
Application configuration 4 FIO jobs QD 64 128K record
Weight-based
distribution
![Page 29: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/29.jpg)
Result #3
Foreground/Background IO control
![Page 30: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/30.jpg)
Result #3
Foreground/Background IO control
Foreground Read IOPS
WRR mode Background process can be throttled 16:1 = Throttle BG process 128:1 = Further throttling. Retains
foreground performance
RR mode Sharp decline in IOPS Background process cannot be throttled
![Page 31: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/31.jpg)
Summary
Differentiated I/O service for applications can be built using WRR arbitration
Scheduler-independent prioritization: Applications get the advantage of the prioritization natively present inside the device
Proposed method does not reduce compute-ability of applications
By not introducing new interface/API, need of rebuilding application is avoided
Future work Kernel patch Sysfs support for run-time WRR configuration
![Page 32: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:](https://reader033.vdocuments.us/reader033/viewer/2022050423/5f92ad27bd1fc6737c777f1f/html5/thumbnails/32.jpg)
AcknowledgementsRajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin