![Page 1: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/1.jpg)
Designing a True Direct-Access File System with DevFS
Yuangang Wang, Jun Xu, Gopinath Palani
Huawei Technologies
Sudarsun Kannan, Andrea Arpaci-Dusseau, Remzi Arpaci-Dusseau
University of Wisconsin-Madison
![Page 2: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/2.jpg)
Modern Fast Storage Hardware
• Faster nonvolatile memory technologies such as NVMe, 3D Xpoint
Hard Drives
H/W Lat: 7.1ms 68us 12us
BW: 2.6MB/s 250MB/s 1.3GB/s
S/W cost: 8us 8us 6us
OS cost: 5us 5us 4us
PCIe-Flash 3D Xpoint
• Bottlenecks shift from hardware to software (file system)2
![Page 3: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/3.jpg)
Why Use OS File System?
• Millions of applications use OS-level file system (FS)
• Object stores have been designed to reduce OS cost [HDFS, CEPH]
- Need faster file systems and not new interface
- Guarantees integrity, concurrency, crash-consistency, and security
• User-level POSIX-based FS fail to satisfy fundamental properties
- Developers unwilling to modify POSIX-interface
3
![Page 4: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/4.jpg)
DevFS
NVMe
Application
Read/Write data
Metadata
Data
Data
Data
Device-level File System (DevFS)
• Move file system into the device hardware
• Use device-level CPU and memory for DevFS
• Apps. bypass OS for control and data plane
• DevFS handles integrity, concurreny, crash-
consistency, and security
• Achieves true direct-access
FS kernel
Check security
Update metadata
4
Update data
Check security
Update metadata
Update data
![Page 5: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/5.jpg)
• Limited memory inside the device
• DevFS lack visibility to OS state (e.g., process permission)
Challenges of Hardware File System
- Reverse-cache inactive file system structures to host memory
- Make OS share required (process) information with “down-call”
5
![Page 6: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/6.jpg)
• Emulate DevFS at the device-driver level
• Benchmarks - more than 2X write and 1.8X read throughput
Performance
• Snappy compression application - up to 22% higher throughput
• Memory-optimized design reduces file system memory by 5X
• Compare DevFS with state-of-the-art NOVA file system
6
![Page 7: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/7.jpg)
Introduction
Background
Motivation
DevFS Design
Evaluation
Conclusion
Outline
![Page 8: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/8.jpg)
FS kernel
Check security
Update metadataUpdate data
NVMe
Application
Read/Write data
Maintain security, manage integrity, crash-consistency, and concurrency
Metadata
Data
Data
Data
Traditional S/W Storage Stack
8
![Page 9: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/9.jpg)
FS kernel
Check security
Update metadataUpdate data
NVMe
Application
Read/Write data
Metadata
Data
Data
Data
Traditional S/W Storage Stack
User-to-kernel switch for every data plane operation
High software-indirection latency before storage access
9
![Page 10: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/10.jpg)
SSD
FS library
Application
Read/Write data
FS kernel
Challenge 1: How to bypass OS and provide direct-storage access?
Holy grail of Storage Research
Challenge 2: How to provide direct-access without compromising integrity, concurrency, crash-consistency, and security?
MetadataData
10
![Page 11: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/11.jpg)
• Prior approaches have attempted to provide user-level direct access
Classes of Direct-Access File Systems
• We categorize them into four classes:
- Hybrid user-level
- Hybrid user-level with trusted server (Microkernel approach)
- Hybrid device
• Full device-level file system (proposed)
11
![Page 12: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/12.jpg)
Hybrid User-level File System
NVMe
FS kernel
ApplicationFS lib
Read/Write Data
Sharing, protection
• Split file system into user library and kernel file components
• Library handles data plane (e.g., read, write) and manages metadata
• Kernel FS handles control plane (e.g., file creation)
Well known hybrid approaches- Arrakis (OSDI ’14)- Strata (SOSP ’17)
Create file
12
![Page 13: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/13.jpg)
Hybrid Device File System
• File system split across user-level library, kernel, and hardware
• Control and data-plane operations same as hybrid user-level FS
• However, some functionalities moved inside the hardware
Well known hybrid approaches- Moneta-D (ASPLOS ‘12)
Application
Read/Write Data
NVMe
FS kernel
FS lib
Sharing, protection
Manage metadata
FS H/WPerm. CheckTx
- TxDev (OSDI ‘08)Create file
13
![Page 14: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/14.jpg)
Introduction
Background
Motivation
DevFS Design
Evaluation
Conclusion
Outline
![Page 15: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/15.jpg)
File System Properties
• Integrity
• Crash-consistency
• Security
- Correctness of FS metadata for single & concurrent access
- FS metadata consistent after a failure
- No permission violation for both control and data-plane- OS-level file system checks permission for control and data plane
15
![Page 16: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/16.jpg)
NVMe
FS kernel
ApplicationFS lib
Coordinate sharing, protection
Manage metadata Direct-access for the data-plane
Hybrid User-level FS Integrity Problem
Create fileMetadata
Data
Arrakis (OSDI ’14), Strata (SOSP ’17)
16
![Page 17: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/17.jpg)
Hybrid User-level FS Integrity Problem
NVMe
FS kernel
ApplicationFS lib
Coordinate sharing, protection
Manage metadataUntrusted (buggy or malicious)
MetadataData
MetadataData
Can compromise metadata integrity and impact crash consistency
Data plane security compromised
Create file
17
![Page 18: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/18.jpg)
1Free block bitmap
Set bitmapAppend
Update inode
Data block
Set bitmapAppend
Update inode
inode {size = 0 m_time = 2
}
inode {size = 4K m_time = 1
}
1
Append(F1, buff, 4k) Append(F1, buff, 4k)App. 1FS lib
App. 2FS lib
Concurrent Access?
Arrakis and Strata trap into OS for data-plane and control plane – No direct access
Skip locking
18
18
![Page 19: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/19.jpg)
Approaches Summary
Class File System
Inte
grit
y
Cra
shC
onsi
sten
cy
Secu
rity
Con
curr
ency
PO
SIX
su
ppor
t
Dir
ect-
acce
ss
Kernel-level FS NOVA
Hybrid user-level FS
Arrakis
Strata
Microkernel Aerie
Hybrid-device FS Moneta-D
TxDev
FUSE Ext4-FUSE
Device FS DevFS
19
![Page 20: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/20.jpg)
Introduction
Background
Motivation
DevFS Design
Evaluation
Conclusion
Outline
![Page 21: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/21.jpg)
DevFS
NVMe
Application
Read/Write data
Metadata
Data
Data
Data
Device-level File System (DevFS)
• Move file system into the device hardware
• Use device-level CPU and memory for DevFS
• Apps. bypass OS for control and data plane
• DevFS handles integrity, concurreny, crash-
consistency, and security
• Achieves true direct-access
FS kernel
Check security
Update metadata
21
Update data
Check security
Update metadata
Update data
![Page 22: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/22.jpg)
DevFS
DevFS Internals
Controller CPU
Global structures
On-disk file metadata
In-memory metadata
Super Block
Bitmaps Inodes Dentries
Super Block
Bitmaps Inodes Dentries
Per-file structures
22
![Page 23: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/23.jpg)
DevFS Internals
Per-file structures
Controller CPU
Submission queue (SQ)
Completionqueue (SQ)
Journal Data
Per-file blocks
Per-file Journal
In-memory filemap tree/root
/root/dir/root/proc
filemap {*dentry*inode;*queues
*mem_journal*disk_journal
}
Global structures
On-disk file metadata
In-memory metadata
Super Block
Bitmaps Inodes Dentries
Super Block
Bitmaps Inodes Dentries
• Modern storage device contain multiple CPUs
• Support up to 64K I/O queues
• To exploit concurrency, each file has own I/O queue and journal
DevFS
23
23
![Page 24: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/24.jpg)
DevFS Internals
Per-file structures
Vaddr = CreateBuffer()
Controller CPU
Submission queue (SQ)
Completionqueue (SQ)
Journal Data
Per-file blocks
Per-file Journal
ApplicationUser FS lib
In-memory filemap tree/root
/root/dir/root/proc
filemap {*dentry*inode;*queues
*mem_journal*disk_journal
}
Global structures
On-disk file metadata
In-memory metadata
Super Block
Bitmaps Inodes Dentries
Super Block
Bitmaps Inodes Dentries
OS allocated command buffer
DevFS
24
24
![Page 25: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/25.jpg)
Per-file structures
ApplicationUser FS lib
On-disk file metadata
In-memory metadata
In-memory filemap tree/root
/root/dir/root/proc
filemap {*dentry*inode;*queues
*mem_journal*disk_journal
}
Submission queue (SQ)
Completionqueue (SQ)
Global structures
Controller CPU
DevFS I/O Operation
Cmd
Cmd
Super Block
Bitmaps Inodes Dentries
Super Block
Bitmaps Inodes Dentries
JournalJournal
Journal Data
Per-file blocks
Open(f1)
Per-file Journal
OS allocated command buffer
DevFS
25
25
![Page 26: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/26.jpg)
Per-file structures
ApplicationUser FS lib
On-disk file metadata
In-memory metadata
In-memory filemap tree/root
/root/dir/root/proc
filemap {*dentry*inode;*queues
*mem_journal*disk_journal
}
Submission queue (SQ)
Completionqueue (SQ)
Global structures
Controller CPU
DevFS I/O Operation
Cmd
Cmd
Super Block
Bitmaps Inodes Dentries
Super Block
Bitmaps Inodes Dentries
JournalJournalJournal
Journal Data
Per-file blocks
Open(f1)
Per-file Journal
OS allocated command buffer
DevFS
26
26
![Page 27: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/27.jpg)
Per-file structures
ApplicationUser FS lib
On-disk file metadata
In-memory metadata
In-memory filemap tree/root
/root/dir/root/proc
filemap {*dentry*inode;*queues
*mem_journal*disk_journal
}
Submission queue (SQ)
Completionqueue (SQ)
Global structures
Controller CPU
DevFS I/O Operation
Cmd
Cmd
Super Block
Bitmaps Inodes Dentries
Super Block
Bitmaps Inodes Dentries
JournalJournalJournal
Journal Data
Per-file blocks
Write(fd, buff, 4k, off=3)
Per-file Journal
OS allocated command buffer
DevFS
27
27
![Page 28: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/28.jpg)
• Capacitors safely flush memory state to storage after power failure
• Capacitance support improves performance
Capacitance Benefits Inside H/W
• DevFS uses device memory for file system state
- Can avoid writing in-memory state to disk journal
- Overcomes the “double writes” problem
• Writing journals to storage has high overheads
• Modern storage devices have device-level capacitors
28
![Page 29: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/29.jpg)
• Limited memory inside the storage device
• DevFS lack visibility to OS state (e.g., process permission)
Challenges of Hardware File System
- Reverse-cache inactive file system structures to host memory
- Make OS share required information with “down-call”
- Please see the paper for more details
29
today’s focus
![Page 30: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/30.jpg)
Device Memory Limitation
• RAM used mainly by file translation layer (FTL)
• Device RAM size constrained by cost ($) and power consumption
- RAM size proportional to FTL’s logical-to-physical block mapping
- Example: 512 GB SSD uses 2 GB RAM to support translations
Unlike kernel FS, device FS footprint must be kept small
30
![Page 31: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/31.jpg)
Memory Consuming File Structures• Our analysis shows four in-memory structures using 90% of memory
- Inode (840 bytes) - created for file open, not freed until deletion
- Dentry (192 bytes) - created for file open, kept in a cache
- File pointer (256 bytes) - released when file is closed
- Others (156 bytes) - e.g., DevFS file map structure
- DevFS memory consumption ~1.2 GB (60% of device memory)• Simple workload - open and close 1 million files
31
![Page 32: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/32.jpg)
Reducing Memory Usage
• Reverse Caching
• On-demand allocation of structures
- Structures such as filemap not used after file is closed
- Allocated after first write and released when a file is closed
- Move inactive structures to host memory
32
![Page 33: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/33.jpg)
0. Reserved during mount
3. open(file)
Device memoryInode list
Dentry listFile Ptr list
DevFS
Reverse-Caching to Reduce Memory
Host memoryInode Cache
Dentry Cache
Host
Application
4. Check host for dentry and inode
5. Move to device and delete cache
1. close(file)
2. Move to host cache
• Move inactive inode and dentry structures to host memory
33
33
![Page 34: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/34.jpg)
Decompose FS Structures• Reverse caching for a complicated for inode
• Inode’s fields accessed even file closing (e.g., directory traversal)
• Frequently moving between host cache and device can be expensive!
• Our solution – split file system structures (e.g., inode) into a host and device structure
34
![Page 35: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/35.jpg)
Devfs inode structure
struct devfs_inode_info {
inode_list
page_tree
journals
…….
struct inode vfs_inode
}
Decompose FS Structures
Decomposed DevFS structure
struct devfs_inode_info {/*always kept in device*/struct *inode_device
/*moved to host after close*/struct *inode_host
}
840 bytes
593 bytes
35
![Page 36: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/36.jpg)
Introduction
Background
Motivation
DevFS Design
Evaluation
Conclusion
Outline
![Page 37: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/37.jpg)
Evaluation
- Filebench
- Snappy – widely used multi-threaded file compression
• Benchmarks and Applications
• Evaluation comparison
- NOVA – state-of-the-art in-kernel NVM file system- DevFS-naïve – DevFS without direct access- DevFS-cap – without direct access but with capacitor support
- DevFS-cap-direct – capacitor support + direct access
• For direct-access, benchmark and applications run as driver
37
![Page 38: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/38.jpg)
Filebench - Random Write
0
4
8
12
16
1KB 4KB 16KB
100K
Ops
/Sec
ond NOVA DevFS-naïve
DevFS-cap DevFS-cap-direct
• DevFS-naïve suffers from high journaling overhead
• DevFS-cap uses capacitors to avoid on-disk journaling
27%
• DevFS-cap-direct achieves true direct-access bypassing OS
2.4X
38
![Page 39: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/39.jpg)
0
0.2
0.4
0.6
0.8
1
1.2
1KB 4KB 16KB 64KB 256KB
100K
Ops
/Sec
ond
NOVA DevFS-naïveDevFS-cap DevFS-cap-direct
Snappy Compression Performance
File Size
Read a file Compress Write output Sync file
• Gains even for compute + I/O intensive application
22%
39
![Page 40: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/40.jpg)
Memory Reduction Benefits
0
400
800
1200
1600
Cap Demand Dentry Inode + Dentry
Mem
ory
Usa
ge (
MB)
filemap dentry inode
• Demand allocation reduces memory consumption by 156MB (14%)
• Inode and Dentry reverse caching reduces memory by 5X
No memoryreduction
On-demand FS structures
Reverse caching Dentry
Reverse caching Dentry + Inode
• Filebench – File Create workload (Create 1M files and close files)
40
![Page 41: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/41.jpg)
0
0.4
0.8
1.2
1.6
2
Cap Demand Dentry Inode +Dentry
Inode +Dentry +
Direct
100
K O
ps/s
ecMemory Reduction Performance Impact
• Dentry and Inode reverse caching overhead less than 14%
• Overhead mainly due to structure movement cost
14%
41
![Page 42: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/42.jpg)
Summary
• Motivation- Eliminating OS overhead and providing direct access is critical- Hybrid user-level file systems compromise fundamental properties
• Solution- We design DevFS that moves FS into the storage H/W- Provides direct-access without compromising FS properties- To reduce memory footprint of DevFS designs reverse-caching
• Evaluation- Emulated DevFS shows up to 2X I/O performance gains- Reduces memory usage by 5X with 14% performance impact
42
![Page 43: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/43.jpg)
Conclusion
• We are moving towards a storage era with microsecond latency
• Eliminating software (OS) overhead is critical
- But without compromising fundamental storage properties
• Near-hardware access latency requires embedding S/W into H/W
• We take first step towards moving file system in H/W
• Several challenges such as H/W integration, support for RAID,
snapshots, and deduplication yet to be addressed
43
![Page 44: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/44.jpg)
Permission Checking
44
APP
User-FS
OS
Host CPU Credentials
0 Task1.cred
1 Task1.cred… …24 Task2.cred
Set credential in DevFS
DevFS credential region
Permission manager
Write(UID, buff, 4k,off=1)
payload=buffops = READ
UID= 1off = 1
size = 4K
t_cred = get_task_cred(CPUID)inode_cred= get_inode_cred(fd)compare_cred(t_cred, inode_cred)
1
Process scheduled to CPU
User space
2
3
4
![Page 45: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/45.jpg)
Concurrent Access
0
0.5
1
1.5
2
1 4 8 12 16
100K
Ops
/Sec
ond
#. Of Instances
NOVA DevFS [+cap] DevFS [+cap +direct]
• Limited device CPUs restricts DevFS scaling
Limited CPUs inside device
• DevFS uses only 4 device CPU45
![Page 46: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/46.jpg)
Slow CPU Impact – Snappy 4KB
46
0
0.2
0.4
0.6
0.8
1
1.2
1.2 1.4 1.8 2.2 2.6
100K
Ops
/sec
CPU Frequency (GHz)
DevFS [+cap] DevFS [+cap +direct]
![Page 47: Designing a True Direct-Access File System with DevFSpages.cs.wisc.edu/~sudarsun/docs/devfs_kannan_fast18.pdf · BW: 2.6MB/s 250MB/s 1.3GB/s S/W cost: ... • Object stores have been](https://reader033.vdocuments.us/reader033/viewer/2022051803/5b028d947f8b9a952f901244/html5/thumbnails/47.jpg)
Questions?
Thanks!
47