design patterns for tunable and efficient ssd-based indexes

26
Ashok Anand, Aaron Gember-Jacobson , Collin Engstrom, Aditya Akella 1 Design Patterns for Tunable and Efficient SSD-based Indexes

Upload: carson-glass

Post on 02-Jan-2016

23 views

Category:

Documents


3 download

DESCRIPTION

Design Patterns for Tunable and Efficient SSD-based Indexes. Ashok Anand , Aaron Gember -Jacobson , Collin Engstrom , Aditya Akella. Large hash-based indexes. ≈20K lookups and inserts per second (1Gbps link). ≥ 32GB hash table. WAN optimizers [ Anand et al. SIGCOMM ’08]. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Design Patterns for Tunable and Efficient SSD-based Indexes

Ashok Anand, Aaron Gember-Jacobson, Collin Engstrom, Aditya Akella

1

Design Patterns for Tunable and Efficient SSD-based Indexes

Page 2: Design Patterns for Tunable and Efficient SSD-based Indexes

Large hash-based indexes

2

WAN optimizers[Anand et al. SIGCOMM ’08]

De-duplicationsystems[Quinlan et al. FAST ‘02]

VideoProxy

[Anand et al. HotNets ’12]

≈20K lookups and inserts per second(1Gbps link)

≥ 32GB hash table

Page 3: Design Patterns for Tunable and Efficient SSD-based Indexes

Use of large hash-based indexes

3

WANoptimizers

De-duplicationsystems

VideoProxy

Where to store the indexes?

Page 4: Design Patterns for Tunable and Efficient SSD-based Indexes

4

Where to store the indexes?

SSD8x less 25x less

Page 5: Design Patterns for Tunable and Efficient SSD-based Indexes

What’s the problem?

• Need domain/workload-specific optimizations for SSD-based index with↑ performance and ↓overhead

• Existing designs have…– Poor flexibility – target a specific point

in the cost-performance spectrum

– Poor generality – only apply to specific workloads or data structures

5

False assumption!

Page 6: Design Patterns for Tunable and Efficient SSD-based Indexes

Our contributions

• Design patterns that ensure:– High performance– Flexibility– Generality

• Indexes based on these principles:– SliceHash– SliceBloom– SliceLSH

6

Page 7: Design Patterns for Tunable and Efficient SSD-based Indexes

Outline

Problem statement• Limitations of state-of-the-art• SSD architecture• Parallelism-friendly design patterns– SliceHash (streaming hash table)

• Evaluation

7

Page 8: Design Patterns for Tunable and Efficient SSD-based Indexes

• BufferHash [Anand et al. NSDI ’10]

– Designed for high throughput

State-of-the-art SSD-based index

8

0123

0123

K,VK,VK,V

0123

K,VK,VK,V

0123

K,VK,V

K,V

In-memoryincarnation

incarnation

KA,VA

KB,VB

KC,VC

0123

KA,VA

KB,VB

KC,VC

K#( )2

Bloom filter

2

4 bytes perK/V pair!

16 page reads in worst case!

(average: ≈1)

Page 9: Design Patterns for Tunable and Efficient SSD-based Indexes

• SILT [Lim et al. SOSP ‘11]– Designed for low memory + high throughput

0123

State-of-the-art SSD-based index

9

Log Hash

0123

KA,VA

KB,VB

KC,VC

Sorted

Hash table K,V

K,V

K,V K,VK,V

Index

≈0.7 bytesper K/V pair

33 page reads in worst case!(average: 1)

High CPU usage!

Target specific workloads and objectives → poor flexibility and generality

Do not leverage internal parallelism

Page 10: Design Patterns for Tunable and Efficient SSD-based Indexes

Flash mempackage 1

Die 1 Die n

10

Flash mem pkg 126

Flash mem pkg 128

Flash mem pkg 4

Plane 1Plane 2

Plane 1

Plan

e 2

Data register

Block 1Page 1Page 2

Block 2Page 1Page 2

SSD controller

Channel 1

Channel 32

Flash mem pkg 125 …

SSD Architecture

…How does the SSD architecture inform our design patterns?

Page 11: Design Patterns for Tunable and Efficient SSD-based Indexes

Flash memory

package 1

Four design principles

I. Store related entries on the same page

II. Write to the SSD at block granularity

III. Issue large reads and large writes

IV. Spread small reads across channels

11

Flash memory

package 1

Flash memory

package 4

Block 2

Channel 1

Channel 32

Block 1Page 1Page 2Page 1

Flash memory

package 4

SliceHash

Page 12: Design Patterns for Tunable and Efficient SSD-based Indexes

I. Store related entries on the same page

• Many hash table incarnations, like BufferHash

Incarnation4 6: K,V5: K,V

20: K,V 1: K,V 3: K,V7: K,V

1 2: K,V 3: K,V54: K,V 6: K,V 7: K,V

0: K,V

64: K,V2: K,V 30: K,V 1: K,V

7: K,V5: K,V

Page

12

Sequential slots from a specific

incarnation

K#( )55

Multiple page reads per lookup!

Page 13: Design Patterns for Tunable and Efficient SSD-based Indexes

I. Store related entries on the same page

• Many hash table incarnations, like BufferHash• Slicing: store same hash slot from

all incarnations on the same page

4 6: K,V5: K,V20: K,V 1: K,V 3: K,V

7: K,V1 2: K,V 3: K,V54: K,V 6: K,V 7: K,V

0: K,V

64: K,V2: K,V 30: K,V 1: K,V

7: K,V5: K,V4

6: K,V5: K,V

2

0: K,V1: K,V

3: K,V

7: K,V

12: K,V3: K,V

54: K,V

6: K,V7: K,V

0: K,V

6

4: K,V

2: K,V3

0: K,V1: K,V

7: K,V

5: K,V

Page

Incarnation

SliceOnly 1 page

read per lookup!

13

5

Specific slot from all

incarnations

Page 14: Design Patterns for Tunable and Efficient SSD-based Indexes

• Insert into a hash table incarnation in RAM• Divide the hash table so all slices

fit into one block

01234567

2

0: K,V1: K,V

3: K,V

12: K,V3: K,V

0: K,V

2: K,V3

0: K,V1: K,V

II. Write to the SSD at block granularity

Incarnation

KB,VB

KC,VC

KA,VA

KD,VD

KE,VE

KF,VF

4

6: K,V5: K,V

7: K,V

54: K,V

6: K,V7: K,V

6

4: K,V

7: K,V

5: K,V

Block

KB,VB

KC,VC

KE,VE

KA,VA

KD,VD

KF,VF

14

SliceTable

Page 15: Design Patterns for Tunable and Efficient SSD-based Indexes

III. Issue large reads and large writes

15

Package 1PageReg

Package 2PageReg

Package 3PageReg

1 2 4 8 16 32 64 1280

50100150200250300

Read size (KB)

MB/

seco

nd re

ad

Page size

Channelparallelism

Packageparallelism

Package 4PageReg

Channel 1

Channel 2

Page 16: Design Patterns for Tunable and Efficient SSD-based Indexes

2 6 10 14 18 22 26 300

50

100

150

200

128KB Writes256KB Writes512KB Writes

# threads

MB/

seco

nd w

ritten

III. Issue large reads and large writes

SSD assigns consecutive chunks (4 pages/8KB) to different channels

16

Block size

Channelparallelism

Page 17: Design Patterns for Tunable and Efficient SSD-based Indexes

• Read entire SliceTable into RAM

• Write entire SliceTable onto SSD

0123

III. Issues large reads and large writes

4

6: K,V5: K,V

7: K,V

54: K,V

6: K,V7: K,V

6

4: K,V

7: K,V

5: K,V

(Block) 2: K,V2

0: K,V1: K,V

3: K,V

1

3: K,V

0: K,V

2: K,V3

0: K,V1: K,V

2: K,V2

0: K,V1: K,V

3: K,V

1

3: K,V

0: K,V

2: K,V3

0: K,V1: K,V

KA,VA

KD,VD

KF,VF

1: KA,VA

2: KD,VD

3: KF,VF

0

2: K,V2

0: K,V1: K,V

3: K,V

1

3: K,V

0: K,V

2: K,V3

0: K,V1: K,V 1: KA,VA

2: KD,VD

3: KF,VF

0

17

Page 18: Design Patterns for Tunable and Efficient SSD-based Indexes

IV. Spread small reads across channels

• Recall: SSD writes consecutive chunks (4 pages) of a block to different channels – Use existing techniques to reverse

engineer [Chen et al. HPCA ‘11]

– SSD uses write-order mapping

18

channel for chunk i = i modulo (# channels)

Page 19: Design Patterns for Tunable and Efficient SSD-based Indexes

• Estimate channel using slot # and chunk size• Attempt to schedule 1 read per channel

(slot # * pages per slot)modulo

(# channels * pages per chunk)

( * pages per slot)modulo

(# channels * pages per chunk)

IV. Spread small reads across channels

19

Channel 0

Channel 1

Channel 2

Channel 3

2 1

1

14

4 5 0

Page 20: Design Patterns for Tunable and Efficient SSD-based Indexes

01234567

2

0: K,V1: K,V

3: K,V

12: K,V3: K,V

0: K,V

2: K,V3

0: K,V1: K,V

SliceHash summary

In-memoryincarnation

KB,VB

KC,VC

KA,VA

KD,VD

KE,VE

KF,VF

4

6: K,V5: K,V

7: K,V

54: K,V

6: K,V7: K,V

6

4: K,V

7: K,V

5: K,V

BlockKB,VB

KC,VC

KE,VE

KA,VA

KD,VD

KF,VF

20

SliceTable

Page

Incarnation

Slice

Specific slot from all incarnations

4

6: K,V5: K,V

7: K,V

54: K,V

6: K,V7: K,V

6

4: K,V

7: K,V

5: K,V

2

0: K,V1: K,V

3: K,V

12: K,V3: K,V

0: K,V

2: K,V3

0: K,V1: K,V

Read/write when updating

Page 21: Design Patterns for Tunable and Efficient SSD-based Indexes

Slice

Hash

BufferHas

hSIL

T012345

0

20

40

60

80

Mem

ory

(byt

es/e

ntry

)

CPU

util

izati

on (%

)

Slice

Hash

BufferHas

hSIL

T0

20406080

100120140

Thro

ughp

ut (K

op

s/se

c)Evaluation: throughput vs. overhead

21

128GBCrucial M4

2.26Ghz4-core

↑6.6x↓12%

8B key8B value

50% insert50% lookup

↑2.8x↑15%

See paper for theoretical analysis

Page 22: Design Patterns for Tunable and Efficient SSD-based Indexes

Evaluation: flexibility

• Trade-off memory for throughput

22

SH 64 In

c.

SH 48 In

c.

SH 32 In

c.

SH 16 In

c.

BufferHas

hSIL

T0

20406080

100120140

Thro

ughp

ut (K

ops

/sec

)

SH 64 In

c.

SH 48 In

c.

SH 32 In

c.

SH 16 In

c.

BufferHas

hSIL

T0

1

2

3

4

5

Mem

ory

(byt

es/e

ntry

)

50% insert50% lookup

Use multiple SSDs for even ↓ memory use and ↑ throughput

Page 23: Design Patterns for Tunable and Efficient SSD-based Indexes

Evaluation: generality

• Workload may change

23

Slice

Hash

BufferHas

hSIL

T0

200400600800

10001200

Lookup-onlyMixedInsert-only

Thro

ughp

ut (K

ops

/sec

)

SH BH SILT0

1

2

3

4 Memory (bytes/entry)

CPU utilization (%)

SH BH SILT0

255075

100

Decreasing!

Constantly low!

Page 24: Design Patterns for Tunable and Efficient SSD-based Indexes

Summary

• Present design practices for low cost and high performance SSD-based indexes

• Introduce slicing to co-locate related entries and leverage multiple levels of SSD parallelism

• SliceHash achieves 69K lookups/sec (≈12% better than prior works), with consistently low memory (0.6B/entry) and CPU (12%) overhead

24

Page 25: Design Patterns for Tunable and Efficient SSD-based Indexes

Evaluation: theoretical analysis

• Parameters– 16B key/value pairs– 80% table utilization– 32 incarnations– 4GB of memory– 128GB SSD– 0.31ms to read a block– 0.83ms to write a block– 0.15ms to read a page

25

overhead

0.6 B/entry

costavg: ≈5.7μs

worst: 1.14ms

cost

avg & worst: 0.15ms

Page 26: Design Patterns for Tunable and Efficient SSD-based Indexes

Evaluation: theoretical analysis

26

overhead

0.6 B/entry

costavg: ≈5.7μs

worst: 1.14ms

cost

avg & worst: 0.15ms

BufferHash

4B/entry

avg: ≈0.2usworst: 0.83ms

avg: ≈0.15msworst: 4.8ms