introduction of system software for persistent memory (reading circle 2014/12/18)

34
Introduction of System Software for Persistent Memory Makoto Shimazu @Reading Circle 2014/12/18 S. R. Dulloor 1,3 , S. Kumar 1 , A. Keshavamurthy 2 , P. Lantz 1 , D. Reddy 1 , R. Sankaran 1 , J. Jackson 1 1 Intel Labs, 2 Intel Corp, 3 Georgia Institute of Technology EuroSys 2014

Upload: makoto-shimazu

Post on 16-Jul-2015

434 views

Category:

Engineering


3 download

TRANSCRIPT

Introduction of System Software for Persistent Memory

Makoto Shimazu

@Reading Circle

2014/12/18

S. R. Dulloor1,3, S. Kumar1, A. Keshavamurthy2, P. Lantz1, D. Reddy1, R. Sankaran1, J. Jackson1

1Intel Labs, 2Intel Corp, 3Georgia Institute of Technology

EuroSys 2014

Contributions

Introduction of pm_wbarrier

File system architecture optimized for PM

light-weight and consistent POSIX file system

memory-mapped I/O

protecting stray writes

Performance evaluation with PM emulator

Outline

Volatile cache problem

Architecture

Consistency

Write protection from stray writes

Implementation

Evaluation

Related Work

Conclusion

Outline

Volatile cache problem

Architecture

Consistency

Write protection from stray writes

Implementation

Evaluation

Related Work

Conclusion

Flush the cache explicitly works well (clflush)

Caching problem in PM

5

fig of HDD/SSD) http://storage-system.fujitsu.com/jp/lib-f/tech/beginner/ssd/

load/store to DRAM

read/write to

SSD/HDD

load/store to PM

Non-volatile Area

Cache

Volatile Area

Flush the cache explicitly works well (clflush)

clflush cannot flush from memory controller

Caching problem in PM

6

fig of HDD/SSD) http://storage-system.fujitsu.com/jp/lib-f/tech/beginner/ssd/

load/store to DRAM

read/write to

SSD/HDD

load/store to PM

Non-volatile Area

Cache

Volatile Area

MC

pm_wbarrier

Feature

Enforce the durability of a cacheline

Steps of usage

1. clflush A flush the cacheline contains A

2. sfence ensure the completion of store

3. pm_wbarrier ensure the durability of every store to PM

Outline

Volatile cache problem

Architecture

Consistency

Write protection from stray writes

Implementation

Evaluation

Related Work

Conclusion

Layout of PMFS

Outline

Volatile cache problem

Architecture

Consistency

Write protection from stray writes

Implementation

Evaluation

Related Work

Conclusion

Consistency

Three existing techniques:

Copy on Write (CoW)

Journaling

Log-structured updates

One more PM specific technique:

Atomic in-place writes

Used for updates on

Data Area

Used for updates on

Meta Data (inode)

Used for updates of

small portion of data

Copy on Write (Shadow Paging)

Safe and consistent method to modify data

Three steps: Copy, Modify, Refer

1: Copy

2: Modify

3: Refer

Recursive Copy!!!

12

Journaling

13

Hello World!

RINKO

NXXXX

hello.txt

1: WRITE “RINKO”

2: WRITE “NOW!!!”

Log

Snapshot

CRASH!

Hello World!

RINKO

NOW!!!

Hybrid method

Metadata

Updated by fine-grained logging

Data

Use Copy on Write method

Distributed small

modification

Centralized large

modification

Copy on Write ☓ (Write Amplification) ◯ (Freely after copy)

Journaling ◯ (Just append logs) ☓ (Double writes)

Fine-grained Logging

64 Bytes granularity is goodfor logging of file system metadata

Extended atomic in-place writes

8 bytes (the same as BPFS)

Update inode’s access time

16 bytes

Using cmpxchg16b instruction

Update inode’s size and modification time

64 bytes

Using RTM (introduced in Haswell and having erratum)

Update a number of inode fields like delete

Outline

Volatile cache problem

Architecture

Consistency

Write protection from stray writes

Implementation

Evaluation

Related Work

Conclusion

Write Protection

Supervisor Mode Access Protection (SMAP)

Prohibit writes into user area

Write windows (introduced in this paper)

Mount as read-only

When writing, CR0.WP is set to zero

Right) http://en.wikipedia.org/wiki/Protection_ring

Outline

Volatile cache problem

Architecture

Consistency

Write protection from stray writes

Implementation

Evaluation

Related Work

Conclusion

Implementation on Linux

Execution In Place (XIP)

Interface of loading data from Flash directly in limited RAM environment

Used to avoid the block device/page cache layer

Testing and Validation

Yat: Hypervisor-based validation framework

Ensure cache flushing and pm_wbarrier are executed in correct order

Paper is published in USENIX ATC’14

Outline

Volatile cache problem

Architecture

Consistency

Write protection from stray writes

Implementation

Evaluation

Related Work

Conclusion

Evaluation

Environment

PM Emulation Platform (PMEP)

PM Block Driver (PMBD)

Results

File-based Access

Memory-Mapped I/O

Write Protection

Evaluation Settings

PM Emulation Platform (PMEP)

Configurable latencies and bandwidth for PM

Configurable pm_wbarrier latency

Environment

Partitioned memory channels using custom BIOS?

Latency Emulation debug hook and HW counter counting LLC stall cycles

Bandwidth Emulation memory controller

Element Value

CPU Xeon(2.6GHz) 8 cores x 2sockets

DRAM 16GB

PM 256GB (disabled NUMA?)

PMBD

Persistent Memory Block Driver (PMBD)presented in MSST’14

Introduced for fair comparison

Open-source implementation https://github.com/linux-pmbd/pmbd

Partition between DRAM and PM

Use non-temporal stores

File-based AccessFile I/O (Right 4 Graphs)

Single thread

Single 64GB file

File Utilities (Bottom)

For Linux Kernel tarball

In-place updates/Logging

Effect of in-place updates

Compare with fine-grained logging... Using 16-byte atomic writes: 1.8X faster

Using 64-byte atomic writes: 18% faster

Logging Overhead

Mmap Random read/write in a single 64GB file

PMFS-D: default 4kB page

PMFS-L: 1GB page

Large enough

not to be on page cache

Thanks to omitting

page cache

Neo4j (user application of mmap)

Dataset 10M nodes/100M edges from Wikipedia dataset

Workload Delete: deleting 2000 nodes and associated edges

Insert: adding back the 2000 nodes and the edges

Query: selecting two nodes and calculate the shortest path

Improvements by

no copy overhead

Improvements by

synchronous write latency

Effect of Write Protection

Multi-threaded workload is

serialized by writing control register

Outline

Volatile cache problem

Architecture

Consistency

Write protection from stray writes

Implementation

Evaluation

Related Work

Conclusion

Related Work

Enhance new storage DFS[30], Log-structured File System[37], Conquest FS[41]

Hybrid of NVM and Disk or Flash Rio File Cache[24], Conquest FS[41]

PM-only Storage BPFS[27], SCMFS[43]

High Level API on PM Failure-atomic msync[33]

NV-Heaps[26], Mnemosyne[40]

Library solutions[39]

Outline

Volatile cache problem

Architecture

Consistency

Write protection from stray writes

Implementation

Evaluation

Related Work

Conclusion

Conclusion

Substantial benefits to legacy application implementing POSIX API

Well-considered consistency protocol

Deep evaluation with PM emulator