ziggurat: a tiered file system for non-volatile main memories … · 2019. 12. 18. · 1 ziggurat:...

63
Ziggurat: A Tiered File System for Non-Volatile Main Memories and Disks Shengan Zheng , Morteza Hoseinzadeh § , Steven Swanson § Shanghai Jiao Tong University § University of California, San Diego

Upload: others

Post on 03-Feb-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

  • 1

    Ziggurat: A Tiered File System for Non-Volatile Main Memories and Disks

    Shengan Zheng†, Morteza Hoseinzadeh§, Steven Swanson§

    † Shanghai Jiao Tong University§ University of California, San Diego

  • 2

    Background

    • Non-volatile main memory (NVMM)– Byte-addressability– Persistence– Direct access (DAX)

    • NVMM file systems– PMFS, SCMFS, NOVA– EXT4-DAX, XFS-DAX– Capacity? DRAM + Flash NVDIMM

    3D-XPoint NVDIMM

  • 3

    Motivation

    Bandwidth

    $/GB

    1GB/s

    100MB/s

    10GB/s

    0.01 0.1 1 10

    Hard Disk Drive

    SATA SSDNVMe SSD

    Optane SSDNVMM

    DRAM

  • 4

    Motivation

    Bandwidth

    $/GB

    1GB/s

    100MB/s

    10GB/s

    0.01 0.1 1 10

    Hard Disk Drive

    SATA SSDNVMe SSD

    Optane SSDNVMM

    DRAM

  • 5

    HDD

    Tiered Storage System

    SSD

    • SSD for speed• HDD for capacity

  • 6

    HDD

    Tiered Storage System

    • NVMM for speed• Disks for capacity

    SSD

    NVMM

  • 7

    Ziggurat Overview

    • Intelligent data placement policy– Send writes to the most suitable tier– High NVMM space utilization

    • Accurate predictors– Predict the synchronicity of each file (synchronicity predictor)– Predict the size of future writes to each file (write size predictor)

    • Efficient migration mechanism– Only migrate cold data in cold files– Migrate file data in groups

  • 8

    Outline

    • Motivation• Data placement policy• Migration mechanism• Evaluation• Conclusion

  • 9

    DiskNVMM

    NVMMNVMM

    Synchronously-updated Asynchronously-updated

    Large writesSmall writes

    Data Placement Policy

    • Although NVMM is the fastest tier in Ziggurat, file writes should not always go to NVMM.

    Synchronicity predictor

    Write size

    predictor

    Data Placement

  • 10

    Synchronicity Predictor

    • Predict whether the future accesses are likely to be synchronouswrite(0,2);fsync();write(2,2);fsync();write(4,2);fsync();

    write(0,2);write(2,2);write(4,2);fsync();

    0 1 2 3

    Data blocks written: 0 / 4

    Synchronous4 5 6 7

    0 1 2 3

    Data blocks written: 0 / 4

    Asynchronous4 5 6 7

    File log

    File data

    Write entry offset, length

    File log

    File data Synchronous

  • 11

    Synchronicity Predictor

    • Predict whether the future accesses are likely to be synchronouswrite(0,2);fsync();write(2,2);fsync();write(4,2);fsync();

    write(0,2);write(2,2);write(4,2);fsync();

    0 1 2 3

    Data blocks written: 2 / 4

    Synchronous4 5 6 7

    0 1 2 3

    Data blocks written: 0 / 4

    Asynchronous4 5 6 7

    File log

    File data

    Write entry offset, length

    File log

    File data Synchronous

    0,2

  • 12

    Synchronicity Predictor

    • Predict whether the future accesses are likely to be synchronouswrite(0,2);fsync();write(2,2);fsync();write(4,2);fsync();

    write(0,2);write(2,2);write(4,2);fsync();

    0 1 2 3

    Data blocks written: 0 / 4

    Synchronous4 5 6 7

    0 1 2 3

    Data blocks written: 0 / 4

    Asynchronous4 5 6 7

    File log

    File data

    Write entry offset, length

    File log

    File data Synchronous

    0,2

  • 13

    Synchronicity Predictor

    • Predict whether the future accesses are likely to be synchronouswrite(0,2);fsync();write(2,2);fsync();write(4,2);fsync();

    write(0,2);write(2,2);write(4,2);fsync();

    0 1 2 3

    Data blocks written: 2 / 4

    Synchronous4 5 6 7

    0 1 2 3

    Data blocks written: 0 / 4

    Asynchronous4 5 6 7

    File log

    File data

    Write entry offset, length

    File log

    File data Synchronous

    0,2 2,2

  • 14

    Synchronicity Predictor

    • Predict whether the future accesses are likely to be synchronouswrite(0,2);fsync();write(2,2);fsync();write(4,2);fsync();

    write(0,2);write(2,2);write(4,2);fsync();

    0 1 2 3

    Data blocks written: 0 / 4

    Synchronous4 5 6 7

    0 1 2 3

    Data blocks written: 0 / 4

    Asynchronous4 5 6 7

    File log

    File data

    Write entry offset, length

    File log

    File data Synchronous

    0,2 2,2

  • 15

    Synchronicity Predictor

    • Predict whether the future accesses are likely to be synchronouswrite(0,2);fsync();write(2,2);fsync();write(4,2);fsync();

    write(0,2);write(2,2);write(4,2);fsync();

    0 1 2 3

    Data blocks written: 2 / 4

    Synchronous4 5 6 7

    0 1 2 3

    Data blocks written: 0 / 4

    Asynchronous4 5 6 7

    File log

    File data

    Write entry offset, length

    File log

    File data Synchronous

    0,2 2,2 4,2

  • 16

    Synchronicity Predictor

    • Predict whether the future accesses are likely to be synchronouswrite(0,2);fsync();write(2,2);fsync();write(4,2);fsync();

    write(0,2);write(2,2);write(4,2);fsync();

    0 1 2 3

    Data blocks written: 0 / 4

    Synchronous4 5 6 7

    0 1 2 3

    Data blocks written: 0 / 4

    Asynchronous4 5 6 7

    File log

    File data

    Write entry offset, length

    File log

    File data Synchronous

    0,2 2,2 4,2

  • 17

    Synchronicity Predictor

    • Predict whether the future accesses are likely to be synchronouswrite(0,2);fsync();write(2,2);fsync();write(4,2);fsync();

    write(0,2);write(2,2);write(4,2);fsync();

    0 1 2 3

    Data blocks written: 0 / 4

    Synchronous4 5 6 7

    0 1 2 3

    Data blocks written: 2 / 4

    Asynchronous4 5 6 7

    File log

    File data

    Write entry offset, length

    File log

    File data Synchronous

    0,2 2,2 4,2

    0,2

  • 18

    Synchronicity Predictor

    • Predict whether the future accesses are likely to be synchronouswrite(0,2);fsync();write(2,2);fsync();write(4,2);fsync();

    write(0,2);write(2,2);write(4,2);fsync();

    0 1 2 3

    Data blocks written: 0 / 4

    Synchronous4 5 6 7

    0 1 2 3

    Data blocks written: 4 / 4

    Asynchronous4 5 6 7

    File log

    File data

    Write entry offset, length

    File log

    File data Synchronous

    0,2 2,2 4,2

    0,2 2,2

  • 19

    Synchronicity Predictor

    • Predict whether the future accesses are likely to be synchronouswrite(0,2);fsync();write(2,2);fsync();write(4,2);fsync();

    write(0,2);write(2,2);write(4,2);fsync();

    0 1 2 3

    Data blocks written: 0 / 4

    Synchronous4 5 6 7

    0 1 2 3

    Data blocks written: 6 / 4

    Asynchronous4 5 6 7

    File log

    File data

    Write entry offset, length

    File log

    File data Synchronous

    0,2 2,2 4,2

    0,2 2,2 4,2

  • 20

    Synchronicity Predictor

    • Predict whether the future accesses are likely to be synchronouswrite(0,2);fsync();write(2,2);fsync();write(4,2);fsync();

    write(0,2);write(2,2);write(4,2);fsync();

    0 1 2 3

    Data blocks written: 0 / 4

    Synchronous4 5 6 7

    0 1 2 3

    Data blocks written: 0 / 4

    Asynchronous4 5 6 7

    File log

    File data

    Write entry offset, length

    File log

    File data

    0,2 2,2 4,2

    0,2 2,2 4,2

  • 21

    Write Size Predictor

    • Predict whether the incoming writes are both large and stable

    0 1 2 3 4 5 6 7

    File log 0,4,3

    File data

    4,4,1 5,1,0 6,1,0

    write(0,4);write(6,1);write(4,4);

    offset, length, counterWrite entry

  • 22

    Write Size Predictor

    • Predict whether the incoming writes are both large and stable

    0 1 2 3 4 5 6 7

    File log 0,4,3

    File data

    4,4,1 5,1,0 6,1,0

    write(0,4);write(6,1);write(4,4);

    offset, length, counterWrite entry

    0,4,?

  • 23

    Write Size Predictor

    • Predict whether the incoming writes are both large and stable

    0 1 2 3 4 5 6 7

    File log 0,4,3

    File data

    4,4,1 5,1,0 6,1,0

    write(0,4);write(6,1);write(4,4);

    offset, length, counterWrite entry

    0,4,?

    Predecessor found

    Large

    Stable

    Length ≥ 4

  • 24

    Write Size Predictor

    • Predict whether the incoming writes are both large and stable

    0 1 2 3 4 5 6 7

    File log 0,4,3

    File data

    4,4,1 5,1,0 6,1,0

    write(0,4);write(6,1);write(4,4);

    offset, length, counterWrite entry

    0,4,?

    Predecessor found

    Large

    Stable

    Length ≥ 4

    0,4,4

  • 25

    Write Size Predictor

    • Predict whether the incoming writes are both large and stable

    0 1 2 3 4 5 6 7

    File log 0,4,3

    File data

    4,4,1 5,1,0 6,1,0

    write(0,4);write(6,1);write(4,4);

    offset, length, counterWrite entry

    0,4,?

    Predecessor found

    Large

    Stable

    Length ≥ 4

    0,4,4

    6,1,?

  • 26

    Write Size Predictor

    • Predict whether the incoming writes are both large and stable

    0 1 2 3 4 5 6 7

    File log 0,4,3

    File data

    4,4,1 5,1,0 6,1,0

    write(0,4);write(6,1);write(4,4);

    offset, length, counterWrite entry

    0,4,?

    Predecessor found

    Large

    Stable

    Length ≥ 4

    0,4,4

    6,1,?

    Predecessor found

    Small

    Stable

    Length < 4

  • 27

    Write Size Predictor

    • Predict whether the incoming writes are both large and stable

    0 1 2 3 4 5 6 7

    File log 0,4,3

    File data

    4,4,1 5,1,0 6,1,0

    write(0,4);write(6,1);write(4,4);

    offset, length, counterWrite entry

    0,4,?

    Predecessor found

    Large

    Stable

    Length ≥ 4

    0,4,4

    6,1,?

    Predecessor found

    Small

    Stable

    Length < 4

    6,1,0

  • 28

    Write Size Predictor

    • Predict whether the incoming writes are both large and stable

    0 1 2 3 4 5 6 7

    File log 0,4,3

    File data

    4,4,1 5,1,0 6,1,0

    write(0,4);write(6,1);write(4,4);

    offset, length, counterWrite entry

    0,4,?

    Predecessor found

    Large

    Stable

    Length ≥ 4

    0,4,4

    6,1,?

    Predecessor found

    Small

    Stable

    Length < 4

    6,1,0

    4,4,?

  • 29

    Write Size Predictor

    • Predict whether the incoming writes are both large and stable

    0 1 2 3 4 5 6 7

    File log 0,4,3

    File data

    4,4,1 5,1,0 6,1,0

    write(0,4);write(6,1);write(4,4);

    offset, length, counterWrite entry

    0,4,?

    Predecessor found

    Large

    Stable

    Length ≥ 4

    0,4,4

    6,1,?

    Predecessor found

    Small

    Stable

    Length < 4

    6,1,0

    4,4,?

    Predecessor not found

    Large

    Unstable

    Length ≥ 4

  • 30

    Write Size Predictor

    • Predict whether the incoming writes are both large and stable

    0 1 2 3 4 5 6 7

    File log 0,4,3

    File data

    4,4,1 5,1,0 6,1,0

    write(0,4);write(6,1);write(4,4);

    offset, length, counterWrite entry

    0,4,?

    Predecessor found

    Large

    Stable

    Length ≥ 4

    0,4,4

    6,1,?

    Predecessor found

    Small

    Stable

    Length < 4

    6,1,0

    4,4,?

    Predecessor not found

    Large

    Unstable

    Length ≥ 4

    4,4,0

  • 31

    Cold Data Identification

    • Average modification time (amtime)– Updated differentially

    0 1 2 3 4 5 6 7

    File log 0,2,2

    File data

    2,2,4 4,2,6

    0 1 2 3 4 5 6 7

    File log 0,2,2

    File data

    2,2,4 4,2,6 6,2,8

    amtime = amtime = 2 ∗ 2 + 4 ∗ 2 + 6 ∗ 22 + 2 + 2 = 44 ∗ 6 + 8 ∗ 2

    6 + 2 = 5

    offset, length, mtimeWrite entry

  • 32

    Cold Data Identification

    Cold File

    File

    File

    File

    CPU 0

    Hot

    File

    File

    File

    File

    CPU 1

    • Among files– Cold lists sorted by amtime

    amtime

    0 1 2 3 4 5 6 7

    File log 0,2,2

    File data

    2,2,4 4,2,6 6,2,8

    amtime = 5

    • Within each file– Cold data blocks relative to amtime

    Cold Cold Hot Hot

    offset, length, mtimeWrite entry

  • 33

    Migration Mechanism

    Chm

    od

    Writ

    e0-

    8K

    Head TailInode

    Inodelog

    NVMM

    Disk

    ... Write 0-4K

    File Page 1

    File Page 2

    File Page 3

    File Page 1

    File Page 4

    Writ

    e8-

    16K

    Pages

    Page state Stale Live

    Entry type Inode update Old write entry New write entry

    • Basic migration– Migrate the coldest data to disk– Consistency is ensured

    • Reverse migration– Migrate from disk to NVMM– Handle mmap– Read-dominated workloads

  • 34

    Migration Mechanism

    Chm

    od

    Writ

    e0-

    8K

    Head TailInode

    Inodelog

    NVMM

    Disk

    ... Write 0-4K

    File Page 1

    File Page 2

    File Page 3

    File Page 1

    File Page 4

    Writ

    e8-

    16K

    File Page 3’

    File Page 4’

    Step 1

    Pages

    Page state Stale Live

    Entry type Inode update Old write entry New write entry

    • Basic migration– Migrate the coldest data to disk– Consistency is ensured

    • Reverse migration– Migrate from disk to NVMM– Handle mmap– Read-dominated workloads

  • 35

    Migration Mechanism

    Chm

    od

    Writ

    e0-

    8K

    Head TailInode

    Inodelog

    NVMM

    Disk

    ... Write 0-4K

    File Page 1

    File Page 2

    File Page 3

    File Page 1

    File Page 4

    Writ

    e8-

    16K

    File Page 3’

    File Page 4’

    Step 1

    Step 2

    Pages

    Page state Stale Live

    Entry type Inode update Old write entry New write entry

    • Basic migration– Migrate the coldest data to disk– Consistency is ensured

    • Reverse migration– Migrate from disk to NVMM– Handle mmap– Read-dominated workloads

  • 36

    Migration Mechanism

    Chm

    od

    Writ

    e0-

    8K

    Head TailInode

    Inodelog

    NVMM

    Disk

    ... Write 0-4K Write 8-16K

    File Page 1

    File Page 2

    File Page 3

    File Page 1

    File Page 4

    Writ

    e8-

    16K

    File Page 3’

    File Page 4’

    Step 1

    Step 2

    Step 3

    Pages

    Page state Stale Live

    Entry type Inode update Old write entry New write entry

    • Basic migration– Migrate the coldest data to disk– Consistency is ensured

    • Reverse migration– Migrate from disk to NVMM– Handle mmap– Read-dominated workloads

  • 37

    Migration Mechanism

    Chm

    od

    Writ

    e0-

    8K

    Head TailInode

    Inodelog

    NVMM

    Disk

    ... Write 0-4K Write 8-16K

    File Page 1

    File Page 2

    File Page 3

    File Page 1

    File Page 4

    Writ

    e8-

    16K

    File Page 3’

    File Page 4’

    Step 1

    Step 2

    Step 3

    Step 4

    Pages

    Page state Stale Live

    Entry type Inode update Old write entry New write entry

    • Basic migration– Migrate the coldest data to disk– Consistency is ensured

    • Reverse migration– Migrate from disk to NVMM– Handle mmap– Read-dominated workloads

  • 38

    Migration Mechanism

    Chm

    od

    Writ

    e0-

    8K

    Head TailInode

    Inodelog

    NVMM

    Disk

    ... Write 0-4K Write 8-16K

    File Page 1

    File Page 2

    File Page 3

    File Page 1

    File Page 4

    Writ

    e8-

    16K

    File Page 3’

    File Page 4’

    Step 1

    Step 2

    Step 3

    Step 4

    PagesStep 5

    Page state Stale Live

    Entry type Inode update Old write entry New write entry

    • Basic migration– Migrate the coldest data to disk– Consistency is ensured

    • Reverse migration– Migrate from disk to NVMM– Handle mmap– Read-dominated workloads

  • 39

    Migration Mechanism

    Chm

    od

    Writ

    e0-

    8K

    Head TailInode

    ... Write 4-8K

    File Page 1

    File Page 2

    File Page 3

    File Page 2

    File Page 4

    Writ

    e8-

    16K

    Writ

    e12

    -16K

    File Page 4

    ...Inodelog

    NVMM

    Disk

    Pages

    Page state Stale Live

    Entry type Inode update Old write entry New write entry

    • Group migration– Coalesce adjacent write entries– Utilize the high sequential

    bandwidth of disks• Benefits

    – Improve migration efficiency– Accelerate future reads– Reduce log size– Moderate disk fragmentation

  • 40

    Migration Mechanism

    Step 1

    File Page 1’

    File Page 2’

    File Page 3’

    File Page 4’

    Chm

    od

    Writ

    e0-

    8K

    Head TailInode

    ... Write 4-8K

    File Page 1

    File Page 2

    File Page 3

    File Page 2

    File Page 4

    Writ

    e8-

    16K

    Writ

    e12

    -16K

    File Page 4

    ...Inodelog

    NVMM

    Disk

    Pages

    Page state Stale Live

    Entry type Inode update Old write entry New write entry

    • Group migration– Coalesce adjacent write entries– Utilize the high sequential

    bandwidth of disks• Benefits

    – Improve migration efficiency– Accelerate future reads– Reduce log size– Moderate disk fragmentation

  • 41

    Migration Mechanism

    Step 1

    File Page 1’

    File Page 2’

    File Page 3’

    File Page 4’

    Chm

    od

    Writ

    e0-

    8K

    Head TailInode

    ... Write 4-8K

    File Page 1

    File Page 2

    File Page 3

    File Page 2

    File Page 4

    Writ

    e8-

    16K

    Writ

    e12

    -16K

    File Page 4

    ...Inodelog

    NVMM

    Disk

    Pages

    Page state Stale Live

    Entry type Inode update Old write entry New write entry

    Step 2

    • Group migration– Coalesce adjacent write entries– Utilize the high sequential

    bandwidth of disks• Benefits

    – Improve migration efficiency– Accelerate future reads– Reduce log size– Moderate disk fragmentation

  • 42

    Migration Mechanism

    Step 1

    File Page 1’

    File Page 2’

    File Page 3’

    File Page 4’

    Chm

    od

    Writ

    e0-

    8K

    Head TailInode

    ... Write 4-8K

    File Page 1

    File Page 2

    File Page 3

    File Page 2

    File Page 4

    Writ

    e8-

    16K

    Writ

    e12

    -16K

    Writ

    e0-

    16K

    File Page 4

    Step 3

    ...Inodelog

    NVMM

    Disk

    Pages

    Page state Stale Live

    Entry type Inode update Old write entry New write entry

    Step 2

    • Group migration– Coalesce adjacent write entries– Utilize the high sequential

    bandwidth of disks• Benefits

    – Improve migration efficiency– Accelerate future reads– Reduce log size– Moderate disk fragmentation

  • 43

    Migration Mechanism

    Step 1

    File Page 1’

    File Page 2’

    File Page 3’

    File Page 4’

    Chm

    od

    Writ

    e0-

    8K

    Head TailInode

    ... Write 4-8K

    File Page 1

    File Page 2

    File Page 3

    File Page 2

    File Page 4

    Writ

    e8-

    16K

    Step 4

    Writ

    e12

    -16K

    Writ

    e0-

    16K

    File Page 4

    Step 3

    ...Inodelog

    NVMM

    Disk

    Pages

    Page state Stale Live

    Entry type Inode update Old write entry New write entry

    Step 2

    • Group migration– Coalesce adjacent write entries– Utilize the high sequential

    bandwidth of disks• Benefits

    – Improve migration efficiency– Accelerate future reads– Reduce log size– Moderate disk fragmentation

  • 44

    Migration Mechanism

    Step 1

    Step 5

    File Page 1’

    File Page 2’

    File Page 3’

    File Page 4’

    Chm

    od

    Writ

    e0-

    8K

    Head TailInode

    ... Write 4-8K

    File Page 1

    File Page 2

    File Page 3

    File Page 2

    File Page 4

    Writ

    e8-

    16K

    Step 4

    Writ

    e12

    -16K

    Writ

    e0-

    16K

    File Page 4

    Step 3

    ...Inodelog

    NVMM

    Disk

    Pages

    Page state Stale Live

    Entry type Inode update Old write entry New write entry

    Step 2

    • Group migration– Coalesce adjacent write entries– Utilize the high sequential

    bandwidth of disks• Benefits

    – Improve migration efficiency– Accelerate future reads– Reduce log size– Moderate disk fragmentation

  • 45

    File Operations

    NVMM

    DRAM

    write(0,8);write(8,8);fsync();append(16,1);append(17,1);...append(23,1);......migrate();......read(16,8);mmap(0,8);

    DISK

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

  • 46

    File Operations

    NVMM

    DRAM

    write(0,8);write(8,8);fsync();append(16,1);append(17,1);...append(23,1);......migrate();......read(16,8);mmap(0,8);

    DISK

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

  • 47

    File Operations

    NVMM

    DRAM

    write(0,8);write(8,8);fsync();append(16,1);append(17,1);...append(23,1);......migrate();......read(16,8);mmap(0,8);

    DISK

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

  • 48

    File Operations

    NVMM

    DRAM

    write(0,8);write(8,8);fsync();append(16,1);append(17,1);...append(23,1);......migrate();......read(16,8);mmap(0,8);

    DISK

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

  • 49

    File Operations

    NVMM

    DRAM

    write(0,8);write(8,8);fsync();append(16,1);append(17,1);...append(23,1);......migrate();......read(16,8);mmap(0,8);

    DISK

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

  • 50

    File Operations

    NVMM

    DRAM

    write(0,8);write(8,8);fsync();append(16,1);append(17,1);...append(23,1);......migrate();......read(16,8);mmap(0,8);

    DISK

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

  • 51

    File Operations

    NVMM

    DRAM

    write(0,8);write(8,8);fsync();append(16,1);append(17,1);...append(23,1);......migrate();......read(16,8);mmap(0,8);

    DISK

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

  • 52

    File Operations

    NVMM

    DRAM

    write(0,8);write(8,8);fsync();append(16,1);append(17,1);...append(23,1);......migrate();......read(16,8);mmap(0,8);

    DISK

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

  • 53

    File Operations

    NVMM

    DRAM

    write(0,8);write(8,8);fsync();append(16,1);append(17,1);...append(23,1);......migrate();......read(16,8);mmap(0,8);

    DISK

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

  • 54

    File Operations

    NVMM

    DRAM

    write(0,8);write(8,8);fsync();append(16,1);append(17,1);...append(23,1);......migrate();......read(16,8);mmap(0,8);

    DISK

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

  • 55

    File Operations

    NVMM

    DRAM

    write(0,8);write(8,8);fsync();append(16,1);append(17,1);...append(23,1);......migrate();......read(16,8);mmap(0,8);

    DISK

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

  • 56

    File Operations

    NVMM

    DRAM

    write(0,8);write(8,8);fsync();append(16,1);append(17,1);...append(23,1);......migrate();......read(16,8);mmap(0,8);

    DISK

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

  • 57

    File Operations

    NVMM

    DRAM

    write(0,8);write(8,8);fsync();append(16,1);append(17,1);...append(23,1);......migrate();......read(16,8);mmap(0,8);

    DISK

    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

  • 58

    Evaluation

    • Experimental setup– Dual-socket Intel Xeon E5 server– 256 GB DRAM– 187 GB Intel DC P4800X Optane SSD– 400 GB Intel DC P3600 NVMe SSD– Ubuntu 16.04 LTS, Linux kernel 4.13.0

    • Emulated NVMM– NUMA effect on DRAM– NUMA node 1: NVMM emulation– NUMA node 0: Processors and memory for applications

    • Disk-based file systems– EXT4-DJ (data journaling)– XFS-ML (metadata logging)

    • Ziggurat– Ziggurat-X (Optane/NVMe

    SSD + X GB of NVMM)• NVMM-based file systems

    – NOVA, EXT4-DAX, XFS-DAX

  • 59

    Filebench

    • With large amount of NVMM, Ziggurat’s performance nearly matches that of NVMM-only file systems.

    Fileserver Varmail

    Within 3%

    2.1X

    2.6X

  • 60

    Rocksdb

    • Ziggurat shows good performance for inserting file data with write-ahead logging (WAL).

    Random insert Sequential insert

    9.9X

    42.4X5.7X

    13.7X

  • 61

    SQLite

    • Ziggurat maintains near-NVMM performance because the hot journal files are frequently updated in NVMM.

    PERSIST WAL

    1.6X

    4.8X

    1.4X

    5.3X

  • 62

    Conclusion

    • Ziggurat fully utilizes the strengths of NVMM and disks to offer high file performance for a wide range of access patterns.

    • [Prediction] Ziggurat steers the incoming writes to the most suitable tier based on the prediction results.

    • [Migration] Ziggurat coalesces adjacent data blocks and migrates them in large chunks to disks.

    • [Evaluation] Ziggurat achieves up to 38.9X and 46.5X throughput improvement compared with EXT4 and XFS running on an SSD alone, respectively.

  • 63

    Thank you

    Ziggurat: A Tiered File System for �Non-Volatile Main Memories and DisksBackgroundMotivationMotivationTiered Storage SystemTiered Storage SystemZiggurat OverviewOutlineData Placement PolicySynchronicity PredictorSynchronicity PredictorSynchronicity PredictorSynchronicity PredictorSynchronicity PredictorSynchronicity PredictorSynchronicity PredictorSynchronicity PredictorSynchronicity PredictorSynchronicity PredictorSynchronicity PredictorWrite Size PredictorWrite Size PredictorWrite Size PredictorWrite Size PredictorWrite Size PredictorWrite Size PredictorWrite Size PredictorWrite Size PredictorWrite Size PredictorWrite Size PredictorCold Data IdentificationCold Data IdentificationMigration MechanismMigration MechanismMigration MechanismMigration MechanismMigration MechanismMigration MechanismMigration MechanismMigration MechanismMigration MechanismMigration MechanismMigration MechanismMigration MechanismFile OperationsFile OperationsFile OperationsFile OperationsFile OperationsFile OperationsFile OperationsFile OperationsFile OperationsFile OperationsFile OperationsFile OperationsFile OperationsEvaluationFilebenchRocksdbSQLiteConclusion幻灯片编号 63