enabling nvme wrr support in linux block layer · rajesh sahoo, anshul sharma, sungyoung ahn, manoj...

32
Enabling NVMe WRR support in Linux Block Layer USENIX HotStorage’17 Kanchan Joshi, Praval Choudhary, Kaushal Yadav Memory solutions, Samsung Semiconductor India R&D

Upload: others

Post on 05-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Enabling NVMe WRR support in Linux Block Layer

USENIX HotStorage’17

Kanchan Joshi, Praval Choudhary, Kaushal YadavMemory solutions, Samsung Semiconductor India R&D

Page 2: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Outline

NVMe I/O queues

Arbitration methods and WRR

What it takes to build differentiated I/O service

Affinity based method and its drawback

Proposed method

Results

Summary

Page 3: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

NVMe I/O QueuesHOST IO Queues NVMe SSD

Page 4: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

NVMe I/O QueuesHOST IO Queues NVMe SSD

Per-CPU queue pair Parallel I/O distribution Fast core-local path

Page 5: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Arbitration Methods

Arbitrate

Round-Robin (RR)

Controller

Page 6: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Arbitration Methods

Arbitrate

Weight 3

Weight 2

Weight 1

Medium

High

Low

Arbitrate

Round-Robin (RR)

Weighted Round-Robin with urgent priority (WRR)

Controller

Controller

Page 7: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Problem Statement

How to make prioritization capability (WRR) benefits reach to Applications!

Page 8: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

WRR Support Requirements

I/O Prioritization

Need to create prioritized I/O queues

Retain NUMA-friendly path

I/O classification

How application can specify I/O service?

Per-application or per I/O?

Page 9: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

WRR Support Requirements

SQ

SQ

SQ

SQ

Non-prioritized queues

SQ

SQ

SQ

Prioritized queues

SQ

URGENT HIGH

MEDIUM LOW

I/O Prioritization

Need to create prioritized I/O queues

Retain NUMA-friendly path

I/O classification

How application can specify I/O service?

Per-application or per I/O?

Page 10: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

WRR Support Requirements

SQ

SQ

SQ

SQ

Non-prioritized queues

SQ

SQ

SQ

Prioritized queues

SQ

URGENT HIGH

MEDIUM LOW

APP1

APP2

APP3

APP4

IO classification method

SQ

SQ

SQ

SQ

I/O Prioritization

Need to create prioritized I/O queues

Retain NUMA-friendly path

I/O classification

How application can specify I/O service?

Per-application or per I/O?

Page 11: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Affinity-based Method

Prioritization method: Each core hosts one type of submission queue (1:1 mapping)

Classification method: Affine applications to particular core(s)

CORE 3

SQ CQ

L

O

W

CORE 2

SQ CQ

M

E

D

I

U

M

CORE 1

SQ CQ

H

I

G

H

CORE 0

SQ CQ

U

R

G

E

N

T

NVMe Controller

Page 12: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Affinity-based Method

Prioritization method: Each core hosts one type of submission queue (1:1 mapping)

Classification method: Affine applications to particular core(s)

CORE 3

SQ CQ

L

O

W

CORE 2

SQ CQ

M

E

D

I

U

M

CORE 1

SQ CQ

H

I

G

H

CORE 0

SQ CQ

U

R

G

E

N

T

NVMe Controller

Affine Affine Affine Affine

URGENT HIGH MEDIUM LOW

Page 13: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Drawbacks

All running applications must be affined (Arbitrary I/O performance otherwise)

Page 14: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Drawbacks

All running applications must be affined (Arbitrary I/O performance otherwise)

C1

HIGH PRIORITY

C3

LOW PRIORITY

C2

MEDIUM PRIORITY

Reduction in compute-ability

Mandatory affinity leading to asymmetric core-utilization

Page 15: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Proposed Method: I/O Priority-based

I/O Prioritization

Create prioritized I/O queues on each core

Retain NUMA-friendly path

I/O Classification

Link NVMe priorities to existing I/O priority classes

Per-application

Page 16: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Proposed Method: I/O Priority-based

I/O Prioritization

Create prioritized I/O queues on each core

Retain NUMA-friendly path

I/O Classification

Link NVMe priorities to existing I/O priority classes

Per-application

CORE 0

SQ CQ

U

R

G

E

N

T

CORE 0

SQ SQ SQ SQ CQ

U

R

G

E

N

T

H

I

G

H

M

E

D

I

U

M

L

O

W

Page 17: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Proposed Method: I/O Priority-based

APP1

APP2

APP3

APP4

IO scheduling class

NVMe queue priority

SQ

SQ

SQ

SQ

I/O Prioritization

Create prioritized I/O queues on each core

Retain NUMA-friendly path

I/O Classification

Link NVMe priorities to existing I/O priority classes

Per-application

CORE 0

SQ CQ

U

R

G

E

N

T

CORE 0

SQ SQ SQ SQ CQ

U

R

G

E

N

T

H

I

G

H

M

E

D

I

U

M

L

O

W

Page 18: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

I/O Priority-based Method

Prioritization Method: Each core hosts four type of submission queues (4:1 mapping) Classification Method: Reuse existing I/O scheduling classes

CORE 0

SQ SQ SQ SQ CQ

U

R

G

E

N

T

H

I

G

H

M

E

D

I

U

M

L

O

W

CORE 1

SQ SQ SQ SQ CQ

U

R

G

E

N

T

H

I

G

H

M

E

D

I

U

M

L

O

W

CORE 2

SQ SQ SQ SQ CQ

U

R

G

E

N

T

H

I

G

H

M

E

D

I

U

M

L

O

W

CORE 3

SQ SQ SQ SQ CQ

U

R

G

E

N

T

H

I

G

H

M

E

D

I

U

M

L

O

W

NVMe Controller

Compute-ability unaffected Does not require modifying applications

Page 19: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

I/O Priority-based Method

Prioritization Method: Each core hosts four type of submission queues (4:1 mapping) Classification Method: Reuse existing I/O scheduling classes

CORE 0

SQ SQ SQ SQ CQ

U

R

G

E

N

T

H

I

G

H

M

E

D

I

U

M

L

O

W

CORE 1

SQ SQ SQ SQ CQ

U

R

G

E

N

T

H

I

G

H

M

E

D

I

U

M

L

O

W

CORE 2

SQ SQ SQ SQ CQ

U

R

G

E

N

T

H

I

G

H

M

E

D

I

U

M

L

O

W

CORE 3

SQ SQ SQ SQ CQ

U

R

G

E

N

T

H

I

G

H

M

E

D

I

U

M

L

O

W

NVMe Controller

Real-time

Compute-ability unaffected Does not require modifying applications

Page 20: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

I/O Priority-based Method

Prioritization Method: Each core hosts four type of submission queues (4:1 mapping) Classification Method: Reuse existing I/O scheduling classes

CORE 0

SQ SQ SQ SQ CQ

U

R

G

E

N

T

H

I

G

H

M

E

D

I

U

M

L

O

W

CORE 1

SQ SQ SQ SQ CQ

U

R

G

E

N

T

H

I

G

H

M

E

D

I

U

M

L

O

W

CORE 2

SQ SQ SQ SQ CQ

U

R

G

E

N

T

H

I

G

H

M

E

D

I

U

M

L

O

W

CORE 3

SQ SQ SQ SQ CQ

U

R

G

E

N

T

H

I

G

H

M

E

D

I

U

M

L

O

W

NVMe Controller

Real-time Best-effort None Idle

Compute-ability unaffected Does not require modifying applications

Page 21: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Modified NVMe Stack (4.10 Kernel)

VFS/Page cache

Single-queue Multi-queue

deadline

CFQ

NVMe driver(Modified)

SATA driver

Block Layer

Specify io-priority class value while running (ionice)

This is stored in io_contextinside task_struct

Obtain io-class value from io_context (or from request)

Map io-class to queue-priority value and place command in corresponding SQ

Real-time Urgent

Best-effort

None

Idle

High

Medium

Low

Page 22: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Ionice example on NVMe

Best-effort

210K

High

Idle

75.8K

Low

None

143K

Medium

Page 23: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Experimental Setup

Linux 4.10 Kernel (Modified NVMe Driver)

Dell R720 server 32 CPUs (2 NUMA nodes) 32 GB RAM

Samsung PM1725a SSD (With WRR arbitration)

Page 24: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Result #1

IOPS distribution among 3 applications

Application configuration 4 FIO jobs QD 64 4K record

Page 25: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Result #1

IOPS distribution among 3 applications

Application configuration 4 FIO jobs QD 64 4K record

Weight-based

distribution

Page 26: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Result #1

IOPS distribution among 3 applications

Application configuration 4 FIO jobs QD 64 4K record

Weight-based

distribution

420 420 423 419

Page 27: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Result #2

Bandwidth distribution among 3 applications

Application configuration 4 FIO jobs QD 64 128K record

Page 28: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Result #2

Bandwidth distribution among 3 applications

Application configuration 4 FIO jobs QD 64 128K record

Weight-based

distribution

Page 29: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Result #3

Foreground/Background IO control

Page 30: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Result #3

Foreground/Background IO control

Foreground Read IOPS

WRR mode Background process can be throttled 16:1 = Throttle BG process 128:1 = Further throttling. Retains

foreground performance

RR mode Sharp decline in IOPS Background process cannot be throttled

Page 31: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

Summary

Differentiated I/O service for applications can be built using WRR arbitration

Scheduler-independent prioritization: Applications get the advantage of the prioritization natively present inside the device

Proposed method does not reduce compute-ability of applications

By not introducing new interface/API, need of rebuilding application is avoided

Future work Kernel patch Sysfs support for run-time WRR configuration

Page 32: Enabling NVMe WRR support in Linux Block Layer · Rajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin. Title: PowerPoint Presentation Author:

AcknowledgementsRajesh Sahoo, Anshul Sharma, Sungyoung Ahn, Manoj Thapliyal, Vikram Singh, and Seunguk Shin