rocksteady: fast migration for low-latency in-memory storagecrash safety during migration • need...

50
Rocksteady: Fast Migration for Low-Latency In-memory Storage Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman 1

Upload: others

Post on 17-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Rocksteady: Fast Migration for Low-LatencyIn-memory Storage

Chinmay Kulkarni, Aniraj Kesavan, Tian Zhang, Robert Ricci, Ryan Stutsman

1

Page 2: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Introduction• Distributed low-latency in-memory key-value stores are emerging

• Predictable response times ~10 µs median, ~60 µs 99.9th-tile

• Problem: Must migrate data between servers• Minimize performance impact of migration → go slow?• Quickly respond to hot spots, skew shifts, load spikes → go fast?

• Solution: Fast data migration with low impact• Early ownership transfer of data, leverage workload skew• Low priority, parallel and adaptive migration

• Result: Migration protocol for RAMCloud in-memory key-value store• Migrates 256 GB in 6 minutes, 99.9th-tile latency less than 250 µs• Median latency recovers from 40 µs to 20 µs in 14 s

2

Page 3: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

multiGet( ) multiGet( )

Why Migrate Data?

Poor spatial locality → High multiGet() fan-out → More RPCs

3

Server 1 Server 2

No LocalityFanout=7

A C B D

A B

Client 1

C D

Client 2

6 Million

Page 4: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

multiGet( )multiGet( )

Migrate To Improve Spatial Locality

4

Server 1 Server 2

A C B D

A B C D

C

BNo LocalityFanout=7

Client 1 Client 2

6 Million

Page 5: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

multiGet( ) multiGet( )

Spatial Locality Improves Throughput

Better spatial locality → Fewer RPCs → Higher throughputBenefits multiGet(), range scans

5

Server 1 Server 2

A B C D

A B C D

Full LocalityFanout=1

No LocalityFanout=7

Client 1 Client 2

6 Million

25 Million

Page 6: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

The RAMCloud Key-Value Store

6

Data Center Fabric

Master

Backup

Master

Backup

Master

Backup

Master

Backup

Client Client Client Client

Coordinator

All Data in RAM

Kernel Bypass/DPDK

10 µs reads

Page 7: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

The RAMCloud Key-Value Store

7

Data Center Fabric

Master

Backup

Master

Backup

Master

Backup

Master

Backup

Client Client Client Client

Coordinator

Write RPC

Page 8: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

The RAMCloud Key-Value Store

8

Data Center Fabric

Master

Backup

Master

Backup

Master

Backup

Master

Backup

Client Client Client Client

Coordinator1x in DRAM

3x on Disk

Write RPC

Page 9: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Fault-tolerance & Recovery In RAMCloud

9

Data Center Fabric

Master

Backup

Master

Backup

Master

Backup

Master

Backup

Client Client Client Client

Coordinator

Page 10: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Fault-tolerance & Recovery In RAMCloud

10

Data Center Fabric

Master

Backup

Master

Backup

Master

Backup

Master

Backup

Client Client Client Client

Coordinator 2 seconds to recover

Page 11: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Performance Goals For Migration

• Maintain low access latency• 10 µsec median latency → System extremely sensitive• Tail latency matters at scale → Even more sensitive

• Migrate data fast• Workloads dynamic → Respond quickly• Growing DRAM storage: 512 GB per server

• Slow data migration → Entire day to scale cluster

11

Page 12: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Rocksteady Overview: Early Ownership Transfer

Problem: Loaded source can bottleneck migrationSolution: Instantly shift ownership and all load to target

12

Source Server Target Server

Client 1 Client 2

Reads and Writes

Client 3 Client 4

Page 13: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Rocksteady Overview: Early Ownership Transfer

Problem: Loaded source can bottleneck migrationSolution: Instantly shift ownership and all load to target

13

Source Server Target Server

Client 1 Client 2 Client 3 Client 4

Instantly Redirected All future operations

serviced at Target

Creates “headroom” to speed migration

Page 14: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Rocksteady Overview: Leverage Skew

14

Source Server Target Server

On-demand Pull

Client 1 Client 2 Client 3 Client 4

Read

Problem: Data has not arrived at source yetSolution: On demand migration of unavailable data

Page 15: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Rocksteady Overview: Leverage Skew

15

Source Server Target Server

Client 1 Client 2 Client 3 Client 4

Read

Problem: Data has not arrived at source yetSolution: On demand migration of unavailable data

Hot keys move early

Median Latency recovers to

20 µs in 14 s

Page 16: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Rocksteady Overview: Adaptive and Parallel

16

Source Server Target Server

On-demand Pull

Client 1 Client 2 Client 3 Client 4

Problem: Old single-threaded protocol limited to 130 MB/sSolution: Pipelined and parallel at source and target

Parallel Pulls

Page 17: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Rocksteady Overview: Adaptive and Parallel

17

Source Server Target Server

On-demand Pull

Client 1 Client 2 Client 3 Client 4

Problem: Old single-threaded protocol limited to 130 MB/sSolution: Pipelined and parallel at source and target

Target Driven

Yields toOn-demand Pulls

Moves 758 MB/sParallel Pulls

Page 18: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

18

Rocksteady Overview: Eliminate Sync Replication

Source Server Target ServerParallel Pulls

On-demand Pull

Client 1 Client 2 Client 3 Client 4

Replication

Problem: Synchronous replication bottleneck at targetSolution: Safely defer replication until after migration

Page 19: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

19

Rocksteady Overview: Eliminate Sync Replication

Source Server Target Server

Client 1 Client 2 Client 3 Client 4

Replication

Problem: Synchronous replication bottleneck at targetSolution: Safely defer replication until after migration

Page 20: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Rocksteady: Putting it all together

• Instantaneous ownership transfer• Immediate load reduction at overloaded source• Creates “headroom” for migration work

• Leverage skew to rapidly migrate hot data• Target comes up to speed with little data movement

• Adaptive parallel, pipelined at source and target• All cores avoid stalls, but yield to client-facing operations

• Safely defer replication at target• Eliminates replication bottleneck and contention 20

Page 21: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Rocksteady

• Instantaneous ownership transfer

• Leverage skew to rapidly migrate hot data

• Adaptive parallel, pipelined at source and target

• Safely defer synchronous replication at target

21

Page 22: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Source Server

Evaluation Setup

22

300 Million Records45 GB

Target Server

ClientYCSB-B (95/5)

Skew=0.99

ClientYCSB-B (95/5)

Skew=0.99

ClientYCSB-B (95/5)

Skew=0.99

ClientYCSB-B (95/5)

Skew=0.99

Page 23: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Evaluation Setup

23

150 Million Records22.5 GB

Target Server

ClientYCSB-B (95/5)

Skew=0.99

ClientYCSB-B (95/5)

Skew=0.99

ClientYCSB-B (95/5)

Skew=0.99

ClientYCSB-B (95/5)

Skew=0.99

150 Million Records22.5 GB

150 Million Records22.5 GB

Page 24: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Instantaneous Ownership Transfer

Before migration: Source over-loaded, Target under-loadedOwnership transfer creates Source headroom for migration

24

80%

25%

Before OwnershipTransfer

Immediately AfterTransfer

Sour

ce C

PU U

tiliza

tion

Created 55%Source CPU Headroom

Page 25: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Rocksteady

• Instantaneous ownership transfer

• Leverage skew to rapidly migrate hot data

• Adaptive parallel, pipelined at source and target

• Safely defer synchronous replication at target

25

Page 26: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Leverage Skew To Move Hot Data

After ownership transfer, hot keys pulled on-demandMore skew → Median restored faster (migrate fewer hot keys)

26

240µs 245µs

155µs

75µs28µs 17µs

Uniform (Low) Skew=0.99 Skew=1.5 (High)

99.9th Latency Median Latency Before Migration:

Median=10 µs99.9th = 60 µs

Page 27: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Rocksteady

• Instantaneous ownership transfer

• Leverage skew to rapidly migrate hot data

• Adaptive parallel, pipelined at source and target

• Safely defer synchronous replication at target

27

Page 28: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Parallel, Pipelined, & Adaptive Pulls

28

Target HashTable

0 8 16 24

WorkerCores

NIC

DispatchCore

Polling

replay replay read(B)

MigrationManager

Pull Buffers

pulling

Per-CoreBuffers

• Target driven, migration manager

• Co-partitioned hash tables, pull from partitions in parallel

• Replay pulled data into per-core buffers

Page 29: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Parallel, Pipelined, & Adaptive Pulls

29

Source

NIC

DispatchCore

Polling

read(A)

HashTable

0 8 16 24

pull(11) pull(17)

GatherList

GatherList

CopyAddresses

• Stateless passive Source

• Granular 20 KB pulls

Page 30: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Parallel, Pipelined, & Adaptive Pulls

30

• Redirect any idle CPU for migration

• Migration yields to regular requests, on-demand pulls

Page 31: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Rocksteady

• Instantaneous ownership transfer

• Leverage skew to rapidly migrate hot data

• Adaptive parallel, pipelined at source and target

• Safely defer synchronous replication at target

31

Page 32: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Each server has a recovery log distributed across the cluster

Backup

Naïve Fault Tolerance During Migration

32

Source

A B C

Target

A

A C

Backup

C

Backup

A B

Backup

A B

Backup

B CSourceRecovery Log

TargetRecovery Log

Page 33: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Naïve Fault Tolerance During Migration

Migrated data needs to be triplicated to target’s recovery log

33

Source

A B C

Target

A

Backup

A C

Backup

C

Backup

A B

Backup

A B

Backup

B CSourceRecovery Log

TargetRecovery Log

Page 34: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Naïve Fault Tolerance During Migration

Migrated data needs to be triplicated to target’s recovery log

34

Source

A B C

Target

A

Backup

A C

Backup

C

Backup

A B

Backup

A B

Backup

B CSourceRecovery Log

TargetRecovery Log

A A A

Page 35: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Synchronous Replication Bottlenecks Migration

Synchronous replication hits migration speed by 34%

35

Source

A B C

Target

A B

Backup

A C

Backup

C

Backup

A B

Backup

A B

Backup

B CSourceRecovery Log

TargetRecovery Log

B A B A B A

Page 36: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Replicate at Target only after all data has been moved over

36

Source

A B C

Target

A B C

Backup

A C

Backup

C

Backup

A B

Backup

A B

Backup

B CSourceRecovery Log

TargetRecovery Log

Rocksteady: Safely Defer Replication At The Target

Page 37: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Writes/Mutations Served By Target

Mutations have to be replicated by the target

37

Source

A B C

Target

AB C’

Write

Backup

A C

Backup

C

Backup

A B

Backup

A B

Backup

B CSourceRecovery Log

TargetRecovery Log

C’ C’ C’

Page 38: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Crash Safety During Migration

• Need both Source and Target recovery log for data recovery

• Initial table state on Source recovery log• Writes/Mutations on Target recovery log

• Transfer ownership back to Source in case of crash• Migration cancelled• Recovery involves both recovery logs

• Source takes a dependency on Target recovery log at migration start

• Stored reliably at the cluster coordinator• Identifies position after which mutations present

38

Page 39: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

If The Source Crashes During Migration

Recover Source, recover from Target recovery log

39

Source

A B C

Target

AB C’

Backup

A C

Backup

C

Backup

A B

Backup

A B

Backup

B CSourceRecovery Log

TargetRecovery Log

C’ C’ C’

Page 40: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

If The Target Crashes During Migration

Recover from Source and Target recovery log, recover Target

40

Source

A B C

Target

AB C’

Backup

A C

Backup

C

Backup

A B

Backup

A B

Backup

B CSourceRecovery Log

TargetRecovery Log

C’ C’ C’

Page 41: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Crash Safety During Migration

• Need both Source and Target recovery log for data recovery

• Initial table state on Source recovery log• Writes/Mutations on Target recovery log

• Transfer ownership back to Source in case of crash• Migration cancelled• Recovery involves both recovery logs

• Source takes a dependency on Target recovery log at migration start

• Stored reliably at the cluster coordinator• Identifies position after which mutations present

41

Safely Transfer Ownership At Migration Start

Safely Delay Replication Till All Data Has Been Moved

Page 42: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Performance of Rocksteady

42

Time Since Experiment Start (Seconds)

Med

ian

Acce

ss L

aten

cy (µ

s)

Rocksteady

Source Keeps Ownership+

Sync Replication

YCSB-B, 300 Million objects (30 B key, 100 B value), migrate half

Page 43: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Performance of Rocksteady

43

Time Since Experiment Start (Seconds)

Med

ian

Acce

ss L

aten

cy (µ

s)

Rocksteady

Source Keeps Ownership+

Sync Replication

Median latencybetter after~14 seconds

YCSB-B, 300 Million objects (30 B key, 100 B value), migrate half

Page 44: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Performance of Rocksteady

44

Time Since Experiment Start (Seconds)

Med

ian

Acce

ss L

aten

cy (µ

s)

Rocksteady

Source Keeps Ownership+

Sync Replication

Median latencybetter after~14 seconds

28% fastermigration

YCSB-B, 300 Million objects (30 B key, 100 B value), migrate half

Page 45: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Related Work• Dynamo: Pre-partition hash keys

• Spanner: Applications given control over locality (Directories)

• FaRM and DrTM: Re-use in-memory redundancy for migration

• Squall: Reconfiguration protocol for H-Store• Early ownership transfer• Paced background migration• Fully partitioned, serial execution, no synchronization

• Each migration pull stalls execution• Synchronous replication at the target

45

Page 46: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Conclusion• Distributed low-latency in-memory key-value stores are emerging

• Predictable response times ~10 µs median, ~60 µs 99.9th-tile

• Problem: Must migrate data between servers• Minimize performance impact of migration → go slow?• Quickly respond to hot spots, skew shifts, load spikes → go fast?

• Solution: Fast data migration with low impact• Leverage skew: Transfer ownership before data, move hot data first• Low priority, parallel and adaptive migration

• Result: Migration protocol for RAMCloud in-memory key-value store• Migrates at 758 MBps with 99.9th-tile latency < 250 µs

Source Code: https://github.com/utah-scs/RAMCloud/tree/rocksteady-sosp201746

Page 47: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Backup Slides

47

Page 48: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Rocksteady Tail Latency Breakdown

050

100150200250300

Rocksteady

48

Page 49: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Rocksteady Tail Latency Breakdown

050

100150200250300

Rocksteady

49

• Disabling parallel pulls brings tail latency down to 160 µsec

Page 50: Rocksteady: Fast Migration for Low-Latency In-memory StorageCrash Safety During Migration • Need both Source and Target recovery log for data recovery • Initial table state on

Rocksteady Tail Latency Breakdown

050

100150200250300

Rocksteady

50

• Disabling parallel pulls brings tail latency down to 160 µsec

• Synchronous on-demand pulls further brings tail latency down to 135 µsec