![Page 1: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/1.jpg)
FlashShare: Punching Through Server
Storage Stack from Kernel to Firmware
for Ultra-Low Latency SSDs
Jie Zhang, Miryeong Kwon, Donghyun Gouk, Sungjoon Koh,
Changlim Lee, Mohammad Alian, Myoungjun Chun, Mahmut
Kandemir, Nam Sung Kim, Jihong Kim and Myoungsoo Jung
![Page 2: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/2.jpg)
Executable Summary
FlashSharePunches throughthe performance barriers
Reduce average turnaroundresponse times by 22%
Reduce 99th-percentile turnaround response times
by 31%
Datacenter
Latency-critical application
Throughput application
Memory-likeperformance
Flash Firmware
NVMe Driver
Block Layer
File System
Interference
Lon
ger
I/O
late
ncy
ULL-SSD
Unaware of ULL-SSD
Unaware of latency-critical
application
Barrier
Ult
ra lo
w la
ten
cy
![Page 3: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/3.jpg)
Motivation: applications in datacenterDatacenter executes a wide range of latency-critical workloads.• Driven by the market of social media and web services;• Required to satisfy a certain level of service-level agreement;• Sensitive to the latency (i.e., turn-around response time);
A typical example: Apache
respond
serverapache
mo
nit
or
TCP/IP service
qu
eu
edata obj.
worker
worker
TCP/IP service
worker
data obj.
HTTPrequest
A key metric:user experience
![Page 4: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/4.jpg)
Motivation: applications in datacenter• Latency-critical applications exhibit varying loads during a day.• Datacenter overprovisions its server resources to meet the SLA.• However, it results in a low utilization and low energy efficiency.
Hour of the day
Figure 1. Example diurnal pattern in queries per second for a Web Search cluster1.
1Power Management of Online Data-Intensive Services.
Frac
tio
n o
f Ti
me
CPU utilization
Figure 2. CPU utilization analysis of Google server cluster2.
2The Datacenter as a Computer.
Varyingloads
30% avg.utilization
![Page 5: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/5.jpg)
Motivation: applications in datacenterPopular solution: co-locating latency-critical and throughput workloads.
Micro’11 ISCA’15
Eurosys’14
![Page 6: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/6.jpg)
Challenge: applications in datacenter
Applications:Apache – Online latency-critical application;PageRank – Offline throughput application;
Server configuration:
Experiment: Apache+PageRank vs. Apache only
Performance metrics:SSD device latency;Response time of latency-critical application;
![Page 7: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/7.jpg)
Challenge: applications in datacenterExperiment: Apache+PageRank vs. Apache only
• The throughput-oriented application drastically increases the I/O accesslatency of the latency-critical application.
• This latency increase deteriorates the turnaround response time ofthe latency-critical application.
Fig 2: Apache response time increases due to PageRank.
Fig 1: Apache SSD latency increases due to PageRank.
![Page 8: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/8.jpg)
Challenge: ULL-SSDThere are emerging Ultra Low-Latency SSD (ULL-SSD) technologies,which can be used for faster I/O services in the datacenter.
ZNAND XL-Flash
New NAND Flash
Samsung Toshiba
3us N/A
100us N/A
Optane nvNitro
TechniquePhase change
RAMMRAM
Vendor Intel Everspin
Read 10us 6us
Write 10us 6us
![Page 9: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/9.jpg)
Challenge: ULL-SSDIn this work, we use engineering sample of Z-SSD.
Z-NAND1
TechnologySLC based 3D NAND
48 stacked word-line layer
Capacity 64Gb/die
Page Size 2KB/Page
Z-NAND based archives “Z-SSD”[1] Cheong, Wooseong, et al. "A flash memory controller for 15μs ultra-low-latency SSD using high-speed 3D NAND flash with 3μs read time." 2018 IEEE International Solid-State Circuits Conference-(ISSCC), 2018.
![Page 10: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/10.jpg)
Challenge: Datacenter server with ULL-SSD
Applications:Apache – online latency-critical application;PageRank – offline throughput application;
Device latency analysis
42x36us 28us
Server configuration:
Unfortunately, the short latency characteristics of ULL-SSD cannot be exposed to users (in particular, for the latency-critical applications).
![Page 11: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/11.jpg)
Challenge: Datacenter server with ULL-SSD
The storage stack is unaware of the characteristics of both latency-critical
workload and ULL-SSD
App App App
ULL-SSD
Caching layer
Filesystem
blkmq
NVMe Driver
blkmq
The current design of blkmq layer, NVMe driver, and SSD firmware can
hurt the performance of latency-critical applications.
ULL-SSD fails to bring short latency, because of the storage stack.
NVMe Driver
ULL-SSD
![Page 12: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/12.jpg)
Blkmq layer: challenge
App App App
ULL-SSD
Caching layer
Filesystem
blkmq
NVMe Driver
I/O
su
bm
issi
on
blkmqSo
ftw
are
Qu
eue
Har
dw
are
Qu
eue
ReqReq
Merge
Req
Req Req ReqIncoming requests
Req
Req
Merge
Req
Queuing Queuing
Software queue: holds latency-critical I/O requests for a long time;
![Page 13: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/13.jpg)
Blkmq layer: challenge
App App App
ULL-SSD
Caching layer
Filesystem
blkmq
NVMe Driver
blkmqSo
ftw
are
Qu
eue
Har
dw
are
Qu
eue
Dispatch
Token=1 Token=0Token=0
ReqReqReq
ReqReq
Software queue: holds latency-critical I/O requests for a long time;Hardware queue: dispatches an I/O request without a knowledge of
the latency-criticality;
I/O
su
bm
issi
on
![Page 14: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/14.jpg)
Blkmq layer: optimization
App App App
ULL-SSD
Caching layer
Filesystem
blkmq
NVMe Driver
blkmqSo
ftw
are
Qu
eue
Har
dw
are
Qu
eue
ReqReqReq
Our solution: bypass.
Req
Req
Req
Req
Soft
war
e Q
ueu
eH
ard
war
e Q
ueu
e
Byp
ass
LatReq ThrReq ThrReqIncoming requests
LatReq
No merge
No I/O scheduling
Latency-critical I/Os: bypass blkmq for a faster response;
ThrReq ThrReq
Throughput I/Os: merge in blkmq for a higher storage bandwidth.
Little penalty
Addressedin NVMe
I/O
su
bm
issi
on
![Page 15: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/15.jpg)
NVMe SQ: challenge (bypass is not simple enough)
App App App
ULL-SSD
Caching layer
Filesystem
blkmq
NVMe DriverNVMe Driver
ULL-SSD
SQ doorbellregister
NVMecontroller
CQ doorbellregister
Co
re 1
Co
re 0
Co
re 2
NVMe SQ NVMe CQ
Incoming requests
Ring
Head
TailTail
Req
Wait
Head
Head
Fetch
+ Tfetch+ 2xTfetch
NVMe protocol-level queue: a latency-critical I/O request can be blockedby prior I/O requests; Time Cost = Tfetch-self + 2xTfetch >200%
overhead
I/O
su
bm
issi
on
![Page 16: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/16.jpg)
NVMe SQ: optimizationIncoming
+ 2xT
Target: Designing towards a responsiveness-aware NVMe submission.Key Insight: • Conventional NVMe controller(s) allow to customize the standard arbitration
strategy for different NVMe protocol-level queue accesses.• Thus, we can make the NVMe controller to decide which NVMe command to
fetch by sharing a hint for the I/O urgency.
![Page 17: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/17.jpg)
NVMe SQ: optimization
App App App
ULL-SSD
Caching layer
Filesystem
blkmq
NVMe DriverNVMe Driver
ULL-SSD
SQ doorbellregister
NVMecontroller
CQ doorbellregister
Co
re 1
Co
re 0
Co
re 2
NVMe SQ NVMe CQ
Ring
+ 2xT
Our Solution:
Lat-SQ doorbell
Thr-SQ doorbell
NVMeCTL
CQ doorbell
Co
re 2
Lat-SQ Thr-SQ Co
re 1
Co
re 0
CQ
Incoming requests ThrReq ThrReq LatReq
Ring Postpone
1. Double SQs (one for latency-critical I/Os, another for throughput I/Os)2. Double the SQ doorbell register3. New arbitration strategy: gives the highest priority to the Lat-SQ
Immediate fetch
I/O
su
bm
issi
on
![Page 18: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/18.jpg)
SSD firmware: challenge
App App App
ULL-SSD
Caching layer
Filesystem
blkmq
NVMe Driver
Embedded cache cannot protect the latency-critical I/O from an eviction;
ULL-SSD
NVMe Controller
Caching Layer
FTL
NAND Flash
Caching LayerI/O Hit
Miss
Em
be
dd
ed
Ca
che
way addr
0 0x0
1 0x1
2 0x2
0x1
0x5
0x8
Req@0x01 Req@0x05 Req@0x04
Cost: TCL+TCACHECost: TCL+TFTL+TNAND +TCACHE
0x4
0x1
Evict
Req@0x08Incoming requests
Embedded cache provides the fastest response(DRAM service)
0x0
I/O
su
bm
issi
on
Embedded cache can be polluted by the throughput requests;
Req@0x0b
0xb 0x5
![Page 19: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/19.jpg)
SSD firmware: optimization
App App App
ULL-SSD
Caching layer
Filesystem
blkmq
NVMe Driver
Our design: splits the internal cache space to protect latency-critical I/Orequests;
ULL-SSD
NVMe Controller
Caching Layer
FTL
NAND Flash
Caching Layer
Em
be
dd
ed
Ca
che
way addr
0 0x0
1 0x1
2 0x2
0x1
0x5
Req@0x01 Req@0x05 Req@0x04
0x4
Evict
Req@0x08Incoming requests
0x0
Protectionregion
0x40x8
I/O
su
bm
issi
on
![Page 20: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/20.jpg)
Filesystem
NVMe Driver
Caching layer
blkmq
NVMe CQ: challenge
App App App
ULL-SSD
NVMe completion: MSI overhead for each I/O request;
ULL-SSD
NVMe Driver
Co
re 1
Co
re 0
Co
re 2
NVMe SQ NVMe CQ
SQ doorbellregister
NVMecontroller
CQ doorbellregister
Message
HeadTail
InterruptController CPU Interrupt
Service Routine
MSI
Inte
rru
pt
contextswitch
Blk
mq
laye
r
contextswitch
Tail
Cost: 2xTCS +TISR
I/O
co
mp
leti
on
Cost: TCS Cost: TCSCost: TISR
![Page 21: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/21.jpg)
Filesystem
NVMe Driver
Caching layer
blkmq
NVMe CQ: optimization
App App App
ULL-SSDULL-SSD
NVMe Driver
Co
re 1
Co
re 0
Co
re 2
NVMe SQ NVMe CQ
SQ doorbellregister
NVMecontroller
CQ doorbellregister
InterruptController CPU Interrupt
Service Routine
MSI
Inte
rru
pt
contextswitch
Blk
mq
laye
r
contextswitch
I/O
co
mp
leti
on
Key insight: state-of-the-art Linux supports a poll mechanism;
Poll
Message
Blkmqlayer Save: 2xTCS +TISRSave: TCSSave: TISRSave: TCS
![Page 22: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/22.jpg)
NVMe CQ: optimizationPoll mechanism can bring benefits to fast storage device.
4KB8KB
16KB32KB
141618202224262830
Avera
ge L
ate
ncy (
sec)
Interrupt
Polling
4KB8KB
16KB32KB
10
12
14
16
18
20
22
Polling
Avera
ge L
ate
ncy (
sec)
Interrupt
ULL SSD
Decreases by
Read: 7.5% & Write: 13.2%
Read Write
![Page 23: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/23.jpg)
NVMe CQ: optimizationHowever, the poll-based I/O services consume most host resources.
4KB8KB
16KB32KB
0
20
40
60
80
100
Me
mo
ry B
ou
nd
(%
) Polling
Interrupt
0
20
40
60
80
100
Time
CP
U U
tiliz
ation (
%)
Interrupt
0
20
40
60
80
100
Time
CP
U U
tiliz
ation (
%)
Polling
![Page 24: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/24.jpg)
InterruptController
Filesystem
NVMe Driver
Caching layer
blkmq
NVMe CQ: optimization
App App App
ULL-SSD
Our solution: selective interrupt service routine (Select-ISR).
ULL-SSD
NVMe Driver
Co
re 1
Co
re 0
Co
re 2
NVMe SQ NVMe CQ
SQ doorbellregister
NVMecontroller
CQ doorbellregister
Message
MSI
Inte
rru
pt
Incoming requests ThrReqLatReq CPU
Blkmqlayer
Message
Interrupt Service Routine
contextswitch
I/O
co
mp
leti
on
Sleep
![Page 25: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/25.jpg)
Design: Responsiveness AwarenessKey Insight: users have a better knowledge of I/O responsiveness (i.e., latency critical/throughput).Our Approach:• Open a set of APIs to users, which pass the workload attribute to Linux PCB.
Call a new utility: chworkload_attr
Modify Linux PCBdata structure
Invoke new system call
![Page 26: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/26.jpg)
Design: Responsiveness AwarenessKey Insight: users have a better knowledge of I/O responsiveness (i.e., latency critical/throughput).Our Approach:• Open a set of APIs to users, which pass the workload attribute to Linux PCB.• Deliver the workload attribute to each layer of storage stack.
Workloadattribute
rsvd2
nvme_cmd
NVMe controller
task_structUser Process
tag array
Embeddedcache
ad
dre
ss_sp
ace
Pa
ge
ca
ch
e
BIOFile system
request
Block layer(blk-mq)
nvme_rw_commandNVMe driver
![Page 27: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/27.jpg)
More optimizationsAdvanced caching layer designs:
• Dynamic cache split scheme: to maximize cache hits in various request patterns;
• Read prefetching: better utilize SSD internal parallelism;• Adjustable read prefetching with ghost cache: adaptive to different
request patterns;
Hardware accelerator designs:• Conduct simple but timing-consuming tasks such as I/O poll and I/O
merge;• Simplify the design of blkmq and NVMe driver.
![Page 28: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/28.jpg)
Experiment Setup
Test Environment
System configurations:• Vanilla – a vanilla Linux-based computer system running on ZSSD;• CacheOpt – compared to Vanilla, it optimizes the cache layer of SSD firmware;• KernelOpt – it optimizes blkmq layer and NVMe I/O submission;• SelectISR – compared to KernelOpt, it adds the optimization of selective ISR;
http://simplessd.org
![Page 29: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/29.jpg)
Evaluation: latency breakdown
46% reduction
38% reduction 5us reduction
• KernelOpt reduces the time cost of blkmq layer by 46% thanks to no queuing time;• As latency-critical I/Os are fetched by NVMe controller immediately, KernelOpt drastically
reduces the waiting time;• CacheOpt better utilizes the embedded cache layer and reduces the SSD access delays by 38%;• By selectively using polling mechanism, SelectISR can reduce the I/O completion time by 5us.
![Page 30: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/30.jpg)
Evaluation: online I/O access
• CacheOpt reduces the average I/O service latency, but it cannot eliminate the long tails;• KernelOpt can remove the long tails, because it can avoid long queuing time and prevents
throughput I/Os from blocking latency-critical I/Os;• SelectISR reduces the average latency further, thanks to selectively using poll mechanism.
![Page 31: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/31.jpg)
ConclusionObservation The ultra-low latency of new memory-based SSDs is not exposed to latency-critical application and have no benefit from user-experience angle;ChallengePiecemeal reformations of the current storage stack won’t work due to multiple barriers; the storage stack is unaware of all behaviors of ULL-SSD and latency-critical applications; Our solutionFlashShare: We expose different levels of I/O responsiveness to the key components in the current storage stack and optimize the corresponding system layers to make ULL visible to users (latency-critical applications). Major results• Reducing average turnaround response times by 22%;• Reducing 99th-percentile turnaround response times by 31%.
![Page 32: FlashShare: Punching Through Server Storage Stack from ... · Samsung Toshiba 3us N/A 100us N/A Optane nvNitro Technique Phase change RAM MRAM Vendor Intel Everspin Read 10us 6us](https://reader031.vdocuments.us/reader031/viewer/2022022803/5c81b65509d3f207418cdb1c/html5/thumbnails/32.jpg)
Thank you