memento : coordinated in-memory caching for data-intensive clusters

Memento: Coordinated In-Memory Caching

for Data-Intensive Clusters

Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth

Kandula, Scott Shenker, Ion Stoica

Data Intensive ComputationData analytic clusters are pervasive

◦Jobs run multiple tasks in parallel◦Jobs operate on petabytes of input

Distributed file systems (DFS) store data distributed and replicated◦Data reads are either disk-local or

remote across the network

Access to disk slowMemory orders of magnitude faster

How do we leverage memory storage for datacenter jobs?

Can we store all data in memory? Machines have tens of gigabytes of

memory

But, huge discrepancy between storage and memory capacities

◦ Facebook cluster has ~200x more data on disk than memory

Use Memory as Cache

Will the data fit in cache?

10% total input is >80% of all jobs

Heavy-tailed 96% of smallest jobs can fit in the

memory cache

Elephants and miceMix of a few “large” jobs and

very many “small” jobs

Large jobs:◦Batch operations◦Production jobs

Small jobs:◦Interactive queries (e.g., Hive,

SCOPE)◦Experimental analytics

Challenge: Small Parallel JobsJob finishes when its last task finishes

◦Need to cache all-or-nothing

In summary…

Only option for memory-locality is caching

96% of jobs can have their data in memory, if we cache it right

OutlineFATE: Cache Replacement

Memento: System Architecture

Evaluation

We care about jobs finishing faster…

Job j that completed in tn time normally, takes tm time with memory caching

◦%Reductionj =

Average % Reduction in Completion Time

Traditional Cache Replacement

Traditional cache replacement policies (e.g., LRU, LFU) optimize for hit-ratio

◦ Belady’s MIN: Evict blocks that are to be accessed “farthest in future”

Belady’s MIN Example

50% cache hit

E, F, B, D, C, A (time

F, B, D, C, A (time

B, D, C, A(time)…

Data Block

A B C D

E B C D

E B F D

MIN: How much do jobs benefit?

Memory-local tasks are 10x (or 90%) fasterB DA CJ1 J2

J1A J1

Reduction:

Average(0 + 0)/2 = 0%

4 computation slots

B, D, C, A(time)

Data Block E B F D

“Whole-job” inputs

50% cache hit

E, F, B, D, C, A (time

F, B, D, C, A (time

B, D, C, A(time)…

Data Block

A B C D

A B E D

A B E F

MIN: How much do jobs benefit?

Memory-local tasks are 10x (or 90%) fasterB DA CJ1 J2

J1A J1

Reduction:

Average(90 + 0)/2 = 45%

4 computation slots

B, D, C, A(time)

Data Block A B E F

(MIN): Average(0 + 0)/2 = 0%Cache hit-ratio not the most

suited

FATE Cache ReplacementMaximize “whole-job” inputs in cache

Need global coordination◦Parallel tasks distributed over different

machines

Property:◦Small jobs get preference◦Large jobs benefit with remaining cache

Waves in the jobSingle

Wave (small jobs) All-or-nothing

Multiple Waves (large jobs)Linear benefits

Waves in the job

Multiple

Evaluation

Global coordination of local caches

Global cache viewBlock

IdClient Id

File Name

… … …

Memento: Salient Features

External Service

Local cache reads

Metadata communication

Evaluation

EvaluationHDFS in conjunction with Memento

Microsoft and Facebook traces replayed◦Replay jobs with same inter-arrival time

Deployment on EC2 cluster of 100 machines◦20GB memory for Memento

Jobs binned by their size

Job Distribution, by bins

Jobs are 77% faster at average

Small jobs see 85% reduction in completion

Cache hit-ratio matters less

Average job faster by 77% with FATE (vs.) 49% with

Memento scales sufficientlyCoordinator handles 10,000

simultaneous client communications

Client can handle eight simultaneous local map tasks

Sufficient for current datacenter loads

Ongoing / Future work >>

Simpler Implementation [1]Ride the OS cache

◦Estimate where block is cached

Change job manager to track block accesses

– No FATE, use default (LRU?)

Initial results show 2.3x improvement in cache hit-rate

Alternate Metrics [2]We optimize for “average %

reduction in completion time” of jobs

Average◦Weighted to include job priorities?

Other metrics◦Reduction of load on disk subsystem?◦Utilization?

Solid State Devices [3]SSDs, a new layer in the storage

hierarchy

Hierarchical Caching◦Include SSDs between disk and memory

What’s the best cache replacement policy?

SummaryMemory-caching can be surprisingly

effective◦…despite disk and memory capacity

discrepancy

Memento: Coordinated cache management◦FATE Replacement Policy (“whole-jobs”)

Encouraging results for datacenter workload

memento : coordinated in-memory caching for data-intensive clusters

Documents

memento - tile warehouse · memento travertino muretto...

memento pattern

a memento for the free world...title a memento for the free...

style memento

an evaluation of caching policies for memento timemaps

memento 101

pacman: coordinated memory caching for parallel...

analisis memento memento

memento experconnect

caching for cash: caching

pacman: coordinated memory caching for parallel …pacman:...

memento mori, memento christi - wawasan...

memento maire

memento - arte wallcovering · collection memento # 83050...

analisis memento

memento - ecmascript

memento ‘88

memento analysis

memento 4.1 release notes -...

memento 2014