di-mmap: a high performance memory-map runtime for data-intensive applications · 2013. 10. 26. ·...

37
Introduction DI-MMAP Runtime Experiments Conclusions DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications DISCS 2012 Nov. 16, 2012 Brian Van Essen, Henry Hsieh (UCLA), Sasha Ames, Maya Gokhale LLNL-PRES-602272 This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344. LLNL-PRES-602272 Slide 1

Upload: others

Post on 29-Sep-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Introduction DI-MMAP Runtime Experiments Conclusions

DI-MMAP: A High Performance Memory-MapRuntime for Data-Intensive ApplicationsDISCS 2012Nov. 16, 2012

Brian Van Essen, Henry Hsieh (UCLA), Sasha Ames, Maya Gokhale

LLNL-PRES-602272This work performed under the auspicesof the U.S. Department of Energy byLawrence Livermore National Laboratoryunder Contract DE-AC52-07NA27344.

LLNL-PRES-602272 Slide 1

Page 2: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Motivation

Enable scalable out-of-core computations for data-intensivecomputing.

Effectively integrate non-volatile random access memory into the HPCnode’s memory architecture.

Address data-intensive computing scalability challenges:I Use node-local NVRAM to support larger working setsI DRAM-cached NVRAM to extend main memory

Allow latency-tolerant applications to be oblivious to transitions fromdynamic to persistent memory when accessing out-of-core data.

LLNL-PRES-602272 Slide 2

Page 3: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

HPC Challenges and opportunities

I Data-intensive high-performance computing applications:I processing of massive real-world graphsI bioinformatics / computational biologyI streamline tracing (in-situ VDA)

I Creating data-intensive architecture is costly and power-intensiveI In traditional HPC architecture DRAM per core is going downI DRAM is expensive: cost and power

I NVRAM technologies promise:I lower latencyI higher densityI better concurrencyI minimal static power→ lower average power

LLNL-PRES-602272 Slide 3

Page 4: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Data-Intensive High-Performance Computing

Data-Intensive Applications:I large data setsI large working sets that exceed capacity of main memoryI memory bound

I irregular data accessI latency sensitiveI minimal computation

Latency-tolerant algorithms:I highly concurrentI avoid bulk synchronous communicationI potentially asynchronous execution

LLNL-PRES-602272 Slide 4

Page 5: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Integrating future NVRAM

Peripherally attached storage in near term

I 2-4 year horizonI Existing PCIe-attached Flash storage

High-performance PCIe-attached NVRAM

I Low access latencyI Efficient random accessI Faster peripheral bus

LLNL-PRES-602272 Slide 5

Page 6: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Challenges for HPC Runtime

Integrating high-performance storage requires:I explicit out-of-core algorithmsI seamless integration of storage into memory hierarchy

I e.g. high-performance memory-map

Linux memory-map runtime does not:I scale well with increased concurrencyI perform well when memory is not freely available

. . . optimize memory-map runtime for data-intensive computing

LLNL-PRES-602272 Slide 6

Page 7: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Direct I/O or memory-mapped I/O

Direct I/O - direct access to NVRAM pagesI Avoids overheads of software stackI Good for fetching multiple pages of data at once

Memory-Mapped I/O - map file/device into app’s virtual memoryI Good for word-level accessI Word access to cached pages is at memory speedsI Eliminates dichotomy between storage and memory

I Data structures easily transition out-of-coreI Can sacrifice performance

Memory-mapped I/O can seamlessly extended the memory hierarchy

LLNL-PRES-602272 Slide 7

Page 8: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Data-intensive memory-map runtime (DI-MMAP)

A high-performance alternative to Linux mmap:I performance scales with increased concurrencyI performance does not degrade under memory pressureI explicit assignment from data structures to buffers

DI-MMAP features:I a fixed sized page bufferI minimal dynamic memory allocationI a simple FIFO buffer replacement policyI preferential caching for frequently accessed pages

LLNL-PRES-602272 Slide 8

Page 9: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Using DI-MMAP

The DI-MMAP device driver:1. is loaded into a running Linux kernel2. it allocates a fixed amount of main memory for page buffering3. it creates a control interface file in the /dev filesystem

Once loaded:1. the control file is then used to create pseudo-files in /dev

2. pseudo-files link (i.e. redirect) to block devices in the system3. accesses to a pseudo-file are redirected to the linked block device4. pseudo-file is memory mapped into the applications virtual

memory space

LLNL-PRES-602272 Slide 9

Page 10: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Buffer Management

DI-MMAP Buffer

Primary FIFO?

Hotpage FIFO

Eviction Queue

is a hot page

remove fromPage Tables

page fault

page evicted Free Page List

TLB Flush / writeback page

DI-MMAP Buffer Page Location Table

page recovered

Minimize the amount of effort needed to find a page to evict:I In the steady state a page is evicted on each page faultI Track recently evicted pages to maintain temporal reuseI Allow bulk TLB operations to reduce inter-processor interrupts

LLNL-PRES-602272 Slide 10

Page 11: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Buffer Management

DI-MMAP Buffer

Primary FIFO?

Hotpage FIFO

Eviction Queue

is a hot page

remove fromPage Tables

page fault

page evicted Free Page List

TLB Flush / writeback page

DI-MMAP Buffer Page Location Table

page recovered

Minimize the amount of effort needed to find a page to evict:I In the steady state a page is evicted on each page faultI Track recently evicted pages to maintain temporal reuseI Allow bulk TLB operations to reduce inter-processor interrupts

LLNL-PRES-602272 Slide 10

Page 12: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Buffer Management

DI-MMAP Buffer

Primary FIFO?

Hotpage FIFO

Eviction Queue

is a hot page

remove fromPage Tables

page fault

page evicted Free Page List

TLB Flush / writeback page

DI-MMAP Buffer Page Location Table

page recovered

Minimize the amount of effort needed to find a page to evict:I In the steady state a page is evicted on each page faultI Track recently evicted pages to maintain temporal reuseI Allow bulk TLB operations to reduce inter-processor interrupts

LLNL-PRES-602272 Slide 10

Page 13: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Buffer Management

DI-MMAP Buffer

Primary FIFO?

Hotpage FIFO

Eviction Queue

is a hot page

remove fromPage Tables

page fault

page evicted Free Page List

TLB Flush / writeback page

DI-MMAP Buffer Page Location Table

page recovered

Minimize the amount of effort needed to find a page to evict:I In the steady state a page is evicted on each page faultI Track recently evicted pages to maintain temporal reuseI Allow bulk TLB operations to reduce inter-processor interrupts

LLNL-PRES-602272 Slide 10

Page 14: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Buffer Management

DI-MMAP Buffer

Primary FIFO?

Hotpage FIFO

Eviction Queue

is a hot page

remove fromPage Tables

page fault

page evicted Free Page List

TLB Flush / writeback page

DI-MMAP Buffer Page Location Table

page recovered

Minimize the amount of effort needed to find a page to evict:I In the steady state a page is evicted on each page faultI Track recently evicted pages to maintain temporal reuseI Allow bulk TLB operations to reduce inter-processor interrupts

LLNL-PRES-602272 Slide 10

Page 15: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Livermore random I/O testbench (LRIOT)Currently lack tools to effectively measure and evaluate NVRAM

I high speed

I highly concurrency

I tolerate complex and unstructured access patterns

FIO: industry standard for benchmarkingI Does not scale well

I Cannot mix concurrency with both processes and threads

LRIOT: high concurrency / high throughput benchmarking toolI Supports a mixture of processes and threads

I Multiple random and deterministic access patterns

I More deterministic timing measurements

LLNL-PRES-602272 Slide 11

Page 16: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

LRIOT system setup

Test platform:I 16 core AMD 8356 Opteron system @ 2.3GHzI 64 GiB of DRAMI RHEL 6 2.6.32I 3× 80 GiB SLC NAND Flash Fusion-io ioDrive PCIe 1.1 x4 cards

I striped RAID 0

Benchmark:I uniform random I/O patternI 6.4 million reads (unique pages)→ 24 GiB working setI 128 GiB file

LLNL-PRES-602272 Slide 12

Page 17: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Read-only LRIOT benchmark

Linux mmap:I Unconstrained performs wellI drops dramatically with 8GiB of

page cache

DI-MMAP:I much better with fixed sized

bufferI only loses 15% performance

from direct I/O with 1 GiB buffer

0 50 100 150 200 250

0e+

001e

+05

2e+

053e

+05

4e+

05

Num. Threads

IOP

S

● ●

LRIOT read−only: 3x Fusion−io

(0) Direct I/O(1) Memory−mapped I/O (Unconstrained)(2) Memory−mapped I/O 8GiB Buffer(3) DI−MMAP I/O 4GiB Buffer(4) DI−MMAP I/O 1GiB Buffer

LLNL-PRES-602272 Slide 13

Page 18: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Read-only LRIOT benchmark

Linux mmap:I Unconstrained performs wellI drops dramatically with 8GiB of

page cache

DI-MMAP:I much better with fixed sized

bufferI only loses 15% performance

from direct I/O with 1 GiB buffer

0 50 100 150 200 250

0e+

001e

+05

2e+

053e

+05

4e+

05

Num. Threads

IOP

S

● ●

LRIOT read−only: 3x Fusion−io

(0) Direct I/O(1) Memory−mapped I/O (Unconstrained)(2) Memory−mapped I/O 8GiB Buffer(3) DI−MMAP I/O 4GiB Buffer(4) DI−MMAP I/O 1GiB Buffer

LLNL-PRES-602272 Slide 13

Page 19: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Read-only LRIOT benchmark

Linux mmap:I Unconstrained performs wellI drops dramatically with 8GiB of

page cache

DI-MMAP:I much better with fixed sized

bufferI only loses 15% performance

from direct I/O with 1 GiB buffer

0 50 100 150 200 250

0e+

001e

+05

2e+

053e

+05

4e+

05

Num. Threads

IOP

S

● ●

LRIOT read−only: 3x Fusion−io

(0) Direct I/O(1) Memory−mapped I/O (Unconstrained)(2) Memory−mapped I/O 8GiB Buffer(3) DI−MMAP I/O 4GiB Buffer(4) DI−MMAP I/O 1GiB Buffer

LLNL-PRES-602272 Slide 13

Page 20: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Read-only LRIOT benchmark

Linux mmap:I Unconstrained performs wellI drops dramatically with 8GiB of

page cache

DI-MMAP:I much better with fixed sized

bufferI only loses 15% performance

from direct I/O with 1 GiB buffer

0 50 100 150 200 250

0e+

001e

+05

2e+

053e

+05

4e+

05

Num. Threads

IOP

S

● ●

LRIOT read−only: 3x Fusion−io

(0) Direct I/O(1) Memory−mapped I/O (Unconstrained)(2) Memory−mapped I/O 8GiB Buffer(3) DI−MMAP I/O 4GiB Buffer(4) DI−MMAP I/O 1GiB Buffer

LLNL-PRES-602272 Slide 13

Page 21: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Read-only LRIOT benchmark

Linux mmap:I Unconstrained performs wellI drops dramatically with 8GiB of

page cache

DI-MMAP:I much better with fixed sized

bufferI only loses 15% performance

from direct I/O with 1 GiB buffer

0 50 100 150 200 250

0e+

001e

+05

2e+

053e

+05

4e+

05

Num. Threads

IOP

S

● ●

LRIOT read−only: 3x Fusion−io

(0) Direct I/O(1) Memory−mapped I/O (Unconstrained)(2) Memory−mapped I/O 8GiB Buffer(3) DI−MMAP I/O 4GiB Buffer(4) DI−MMAP I/O 1GiB Buffer

LLNL-PRES-602272 Slide 13

Page 22: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Write-only LRIOT benchmark

DI-MMAP:I on par with unconstrained Linux

mmap

I > 2× Linux mmap with 8GiBpage cache

I does not match performance ofdirect I/O(subject to further investigation)

0 50 100 150 200 250

0e+

001e

+05

2e+

053e

+05

4e+

05

Num. Threads

IOP

S

●●

●●

LRIOT write−only: 3x Fusion−io

(0) Direct I/O(1) Memory−mapped I/O (Unconstrained)(2) Memory−mapped I/O 8GiB Buffer(3) DI−MMAP I/O 4GiB Buffer(4) DI−MMAP I/O 1GiB Buffer

LLNL-PRES-602272 Slide 14

Page 23: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Write-only LRIOT benchmark

DI-MMAP:I on par with unconstrained Linux

mmap

I > 2× Linux mmap with 8GiBpage cache

I does not match performance ofdirect I/O(subject to further investigation)

0 50 100 150 200 250

0e+

001e

+05

2e+

053e

+05

4e+

05

Num. Threads

IOP

S

●●

●●

LRIOT write−only: 3x Fusion−io

(0) Direct I/O(1) Memory−mapped I/O (Unconstrained)(2) Memory−mapped I/O 8GiB Buffer(3) DI−MMAP I/O 4GiB Buffer(4) DI−MMAP I/O 1GiB Buffer

LLNL-PRES-602272 Slide 14

Page 24: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Write-only LRIOT benchmark

DI-MMAP:I on par with unconstrained Linux

mmap

I > 2× Linux mmap with 8GiBpage cache

I does not match performance ofdirect I/O(subject to further investigation)

0 50 100 150 200 250

0e+

001e

+05

2e+

053e

+05

4e+

05

Num. Threads

IOP

S

●●

●●

LRIOT write−only: 3x Fusion−io

(0) Direct I/O(1) Memory−mapped I/O (Unconstrained)(2) Memory−mapped I/O 8GiB Buffer(3) DI−MMAP I/O 4GiB Buffer(4) DI−MMAP I/O 1GiB Buffer

LLNL-PRES-602272 Slide 14

Page 25: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Write-only LRIOT benchmark

DI-MMAP:I on par with unconstrained Linux

mmap

I > 2× Linux mmap with 8GiBpage cache

I does not match performance ofdirect I/O(subject to further investigation)

0 50 100 150 200 250

0e+

001e

+05

2e+

053e

+05

4e+

05

Num. Threads

IOP

S

●●

●●

LRIOT write−only: 3x Fusion−io

(0) Direct I/O(1) Memory−mapped I/O (Unconstrained)(2) Memory−mapped I/O 8GiB Buffer(3) DI−MMAP I/O 4GiB Buffer(4) DI−MMAP I/O 1GiB Buffer

LLNL-PRES-602272 Slide 14

Page 26: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Write-only LRIOT benchmark

DI-MMAP:I on par with unconstrained Linux

mmap

I > 2× Linux mmap with 8GiBpage cache

I does not match performance ofdirect I/O(subject to further investigation)

0 50 100 150 200 250

0e+

001e

+05

2e+

053e

+05

4e+

05

Num. Threads

IOP

S

●●

●●

LRIOT write−only: 3x Fusion−io

(0) Direct I/O(1) Memory−mapped I/O (Unconstrained)(2) Memory−mapped I/O 8GiB Buffer(3) DI−MMAP I/O 4GiB Buffer(4) DI−MMAP I/O 1GiB Buffer

LLNL-PRES-602272 Slide 14

Page 27: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Microbenchmarks system setup

Test platform:I 8 core AMD 2378 Opteron system @ 2.4GHzI 16 GiB of DRAMI 2× 200 GiB SLC NAND Flash Virident tachIOn Drive PCIe 1.1 x8

Benchmarks:1. Binary search on sorted vector2. Lookup on Ordered Map (Red-Black Tree)3. Lookup on Unordered Map (Hash Table)

I database size ranged from ∼ 112GiB to ∼ 135GiBI each micro-benchmark issued 220 queries

LLNL-PRES-602272 Slide 15

Page 28: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Microbenchmarks: BST and Ordered MapDI-MMAP:

I significantly exceeds the performance of Linux mmap when each is constrained to anequal amount of buffering

I approaches the performance of mmap with no memory constraint

0 50 100 150 200 250

010

000

2000

030

000

4000

050

000

Num. Threads

Look

ups

per

seco

nd

●●

Binary Search on Sorted Vector: 2x Virident

(0) Memory−mapped I/O (Unconstrained)(1) Memory−mapped I/O 8GiB Buffer(2) Memory−mapped I/O 4GiB Buffer(3) DI−MMAP I/O 4GiB Buffer

0 50 100 150 200 250

010

000

2000

030

000

4000

050

000

Num. Threads

Look

ups

per

seco

nd

●●

Lookup on Ordered Map (Red−Black Tree): 2x Virident

(0) Memory−mapped I/O (Unconstrained)(1) Memory−mapped I/O 8GiB Buffer(2) Memory−mapped I/O 4GiB Buffer(3) DI−MMAP I/O 4GiB Buffer

LLNL-PRES-602272 Slide 16

Page 29: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Microbenchmarks: Unordered mapDI-MMAP:

I significantly exceeds the performance of Linux mmap when each is constrained to anequal amount of buffering

I approaches the performance of mmap with no memory constraint

0 50 100 150 200 250

0e+

002e

+04

4e+

046e

+04

8e+

041e

+05

Num. Threads

Look

ups

per

seco

nd

●●

Lookup on Unordered Map (Hash Table): 2x Virident

(0) Memory−mapped I/O (Unconstrained)(1) Memory−mapped I/O 8GiB Buffer(2) Memory−mapped I/O 4GiB Buffer(3) DI−MMAP I/O 4GiB Buffer

LLNL-PRES-602272 Slide 17

Page 30: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Metagenomic Search & Classification

Metagenomics:I sequencing of heterogenous genetic fragmentsI fragments (aka reads) may be derived from many organisms

Application queries a database of genetic markers called k-mers:I length k sequences out of a DNA, RNA, or protein alphabetI k-mer database stored in Flash storageI access patterns to the datasets are extremely randomI classification requires global view of reference database

Two tests:I k-mer lookupI sample classification

LLNL-PRES-602272 Slide 18

Page 31: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Metagenomics Search & Classification system setup

Test platform:I 4 socket, 40 core, Intel E7 4850 @ 2 GHzI 1 TiB DRAMI Linux kernel 2.6.32 (Red Hat Enterprise 6).I 2× Fusion-io 1.2 TB ioDrive2 cards PCIe-2.0 x4

I RAID 0I block sizes of 4 KiB

I 16 GiB DRAM available for buffer cache

Application:I k = 18I database size is 635 GiB

LLNL-PRES-602272 Slide 19

Page 32: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

K-mer lookup

Number of Threads0 50 100 150 200 250 300 350

K−

mer

s pe

r se

cond

0

10000

20000

30000

40000

50000

60000

70000

80000

DI−MMAP

standard−fs−mmap

Peak Performance:I 16 threads with Linux mmapI 240 threads for DI-MMAP

Lookups per second with DI-MMAP is 4.92× higher thanwith Linux mmap

LLNL-PRES-602272 Slide 20

Page 33: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Metagenomic Sample Classification

Input set (metagenome)SRX DRR ERR

Bas

es p

er s

econ

d

0

20000

40000

60000

80000

100000 FS_MMAP_16_threadsDI−MMAP_16_threadsFS_MMAP_160_threadsDI−MMAP_160_threads

Near peak performance:I 16 threads for Linux

mmap

I 160 threads for DI-MMAP

Performance advantage ofDI-MMAP vs. Linux mmap:

I 4.88× for SRX input setI 3.66× for ERR input set

Performance varies with:I % of redundant k-mersI diversity of metagenome

(e.g. ERR)

LLNL-PRES-602272 Slide 21

Page 34: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Metagenomic Sample Classification

Input set (metagenome)SRX DRR ERR

Bas

es p

er s

econ

d

0

20000

40000

60000

80000

100000 FS_MMAP_16_threadsDI−MMAP_16_threadsFS_MMAP_160_threadsDI−MMAP_160_threads

Near peak performance:I 16 threads for Linux

mmap

I 160 threads for DI-MMAP

Performance advantage ofDI-MMAP vs. Linux mmap:

I 4.88× for SRX input setI 3.66× for ERR input set

Performance varies with:I % of redundant k-mersI diversity of metagenome

(e.g. ERR)

LLNL-PRES-602272 Slide 21

Page 35: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Metagenomic Sample Classification

Input set (metagenome)SRX DRR ERR

Bas

es p

er s

econ

d

0

20000

40000

60000

80000

100000 FS_MMAP_16_threadsDI−MMAP_16_threadsFS_MMAP_160_threadsDI−MMAP_160_threads

Near peak performance:I 16 threads for Linux

mmap

I 160 threads for DI-MMAP

Performance advantage ofDI-MMAP vs. Linux mmap:

I 4.88× for SRX input setI 3.66× for ERR input set

Performance varies with:I % of redundant k-mersI diversity of metagenome

(e.g. ERR)

LLNL-PRES-602272 Slide 21

Page 36: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Conclusions

The data-intensive memory-map (DI-MMAP) runtime:1. provides scalable, out-of-core performance for data-intensive applications

2. allows increased performance of algorithms with increased concurrency

3. performance does not significantly degrade with smaller buffer size

DI-MMAP:I provides a viable solution for scalable out-of-core algorithmsI offloads the explicit buffering requirements from the application to the runtimeI allows the application to access its external data through a simple load/store

interfaceI hides much of the complexity of data movementI approaches the raw, peak performance of direct I/O

LLNL-PRES-602272 Slide 22

Page 37: DI-MMAP: A High Performance Memory-Map Runtime for Data-Intensive Applications · 2013. 10. 26. · Data-Intensive High-Performance Computing Data-Intensive Applications: I large

Lawrence Livermore National Laboratory

Introduction DI-MMAP Runtime Experiments Conclusions

Thank You!

Questions?

Open source release is in progress:

https://computation.llnl.gov/casc/dcca-pub/dcca/Data-centric_architecture.html

LLNL-PRES-602272 Slide 23