hashcache: cache storage for the next billion · •hashcache: storage engine w/ plug-in indexing...

138
HashCache: Cache Storage for the Next Billion 1 Anirudh Badam KyoungSoo Park Vivek S. Pai Larry L. Peterson Princeton University

Upload: others

Post on 04-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

HashCache: Cache Storage for the Next Billion

1

Anirudh Badam

KyoungSoo Park Vivek S. Pai Larry L. Peterson

Princeton University

Page 2: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Next Billion Internet Users

2

Page 3: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Next Billion Internet Users

• Schools, urban middle class in developing regions

2

Page 4: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Next Billion Internet Users

• Schools, urban middle class in developing regions

• Affordable hardware

• OLPC and Intel Classmate

2

$200

Page 5: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Next Billion Internet Users

• Schools, urban middle class in developing regions

• Affordable hardware

• OLPC and Intel Classmate

• Expensive Internet

• $1500+ per month per Mbps

• Unlikely to improve in the near future

2

$200

$1500per month

Page 6: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Bandwidth Saving

3

Page 7: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Bandwidth Saving

• Connectivity is a precious resource

3

Page 8: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Bandwidth Saving

• Connectivity is a precious resource

• Saving bandwidth important

• Disk caches reduce network bandwidth requirement

3

Internet

Cache

Page 9: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Bandwidth Saving

• Connectivity is a precious resource

• Saving bandwidth important

• Disk caches reduce network bandwidth requirement

• Good news - disk is very cheap

• $100/TB

3

Internet

Cache

Page 10: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Why Large Caches?

4

Page 11: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Why Large Caches?

• Larger bandwidth savings

• Refreshes cheaper than re-fetches

• Overnight prefetch, content push from peers

4

Page 12: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Why Large Caches?

• Larger bandwidth savings

• Refreshes cheaper than re-fetches

• Overnight prefetch, content push from peers

• Good offline behavior

• Preload websites

• Enables local search

4

Page 13: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Why Large Caches?

• Larger bandwidth savings

• Refreshes cheaper than re-fetches

• Overnight prefetch, content push from peers

• Good offline behavior

• Preload websites

• Enables local search

• Save even on dynamic content

• WAN Acceleration = packet caching

4

Page 14: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

What is the Cost?

5

Page 15: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

What is the Cost?

• In-memory data structures

• Hash table avoids seeks for misses

• Cache replacement (LRU, etc)

5

70 seeksper sec

Page 16: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

What is the Cost?

• In-memory data structures

• Hash table avoids seeks for misses

• Cache replacement (LRU, etc)

• RAM index size per TB

• Open Source (Squid) - 10 GB

• Commercial (Tiger) - 5 GB

5

70 seeksper sec

Page 17: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

What is the Cost?

• In-memory data structures

• Hash table avoids seeks for misses

• Cache replacement (LRU, etc)

• RAM index size per TB

• Open Source (Squid) - 10 GB

• Commercial (Tiger) - 5 GB

• Can not use laptops for cache

• 2 servers, $2K each = 20 laptops

5

70 seeksper sec

x 20

Page 18: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Our Solution

6

Page 19: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

• HashCache: storage engine w/ plug-in indexing

• 6 schemes in paper, 3 shown here

• New Web proxy using HashCache engine

Our Solution

6

Page 20: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

• HashCache: storage engine w/ plug-in indexing

• 6 schemes in paper, 3 shown here

• New Web proxy using HashCache engine

• Efficiency and Performance:

• Range of policies trading speed and memory

• 20-50x less RAM for Squid-like speed

• 6-10x less RAM vs Tiger (Commercial)

Our Solution

6

Page 21: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

• HashCache: storage engine w/ plug-in indexing

• 6 schemes in paper, 3 shown here

• New Web proxy using HashCache engine

• Efficiency and Performance:

• Range of policies trading speed and memory

• 20-50x less RAM for Squid-like speed

• 6-10x less RAM vs Tiger (Commercial)

• Can now use cheap laptops vs servers

• Even for TB-sized caches

Our Solution

6

Page 22: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Performance: req/sec/disk

0

1

2

3

4

5

0 70 140 210 280 350

HashCache

Current

SquidTigerG

igab

ytes

/Dol

lar

Our Solution

6

Better

Page 23: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

A Brief History of Cache

7

Page 25: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

A Brief History of Cache

7

URL

ddddhhhhbbbb

Page 26: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

A Brief History of Cache

7

URL

ddddhhhhbbbb} } }

DirectoryLevel 1

DirectoryLevel 2

Actual File

Page 27: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

A Brief History of Cache

7

URL

ddddhhhhbbbb} } }

DirectoryLevel 1

DirectoryLevel 2

Actual File

In-memoryHashtable

Page 28: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

A Brief History of Cache

7

URL

ddddhhhhbbbb} } }

DirectoryLevel 1

DirectoryLevel 2

Actual File

In-memoryHashtable

Squid

Page 30: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

A Brief History of Cache

7

URL

Filesystem

CircularLog

Page 31: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

A Brief History of Cache

7

URL

hash_value

Filesystem

CircularLog

Page 32: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

A Brief History of Cache

7

URL

hash_valueoffset

Memory

hash_value

Filesystem

CircularLog

Page 33: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

A Brief History of Cache

7

URL

hash_valueoffset

Memory

hash_value

Filesystem

CircularLog

Tiger

Page 34: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

A Brief History of Cache

• Open Source Implementation - Squid

• Multiple seeks for hit, miss and write

• Dependent on default filesystems

• Commercial/High Performance - Tiger

• One seek for hit

• Custom file layout

7

Page 35: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Index Element Sizes

8

Page 36: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

FunctionalityImplementation

ChoiceSquid(Bits)

Tiger(Bits)

ExistenceIdentification

HashtableChaining Pointers 96 96

Hash 160 32

Replacement Policy LRU List Pointers 64 64

LocationInformation

Disk Offset,Version Number,

etc0 40

OtherExpiration Date, Size, HTTP header info etc 240 0

Total 560 232

Index Element Sizes

8

Page 37: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

FunctionalityImplementation

ChoiceSquid(Bits)

Tiger(Bits)

ExistenceIdentification

HashtableChaining Pointers 96 96

Hash 160 32

Replacement Policy LRU List Pointers 64 64

LocationInformation

Disk Offset,Version Number,

etc0 40

OtherExpiration Date, Size, HTTP header info etc 240 0

Total 560 232

Index Element Sizes

8

Page 38: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

FunctionalityImplementation

ChoiceSquid(Bits)

Tiger(Bits)

ExistenceIdentification

HashtableChaining Pointers 96 96

Hash 160 32

Replacement Policy LRU List Pointers 64 64

LocationInformation

Disk Offset,Version Number,

etc0 40

OtherExpiration Date, Size, HTTP header info etc 240 0

Total 560 232

Index Element Sizes

Focusing mainlyon reducing thesize of the index

8

Page 39: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

FunctionalityImplementation

ChoiceSquid(Bits)

Tiger(Bits)

ExistenceIdentification

HashtableChaining Pointers 96 96

Hash 160 32

Replacement Policy LRU List Pointers 64 64

LocationInformation

Disk Offset,Version Number,

etc0 40

OtherExpiration Date, Size, HTTP header info etc 240 0

Total 560 232

Index Element Sizes

8

Page 40: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

FunctionalityImplementation

ChoiceSquid(Bits)

Tiger(Bits)

ExistenceIdentification

HashtableChaining Pointers 96 96

Hash 160 32

Replacement Policy LRU List Pointers 64 64

LocationInformation

Disk Offset,Version Number,

etc0 40

OtherExpiration Date, Size, HTTP header info etc 240 0

Total 560 232

Index Element Sizes

8

Page 41: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Revisiting the Index...

9

Page 42: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Revisiting the Index...

In-memoryIndex

Application Cache

Manager

ReliableFilesystem

9

Page 43: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Cache sizelimited by

memory size

Revisiting the Index...

In-memoryIndex

Application Cache

Manager

ReliableFilesystem

9

Page 44: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Cache sizelimited by

memory size

Revisiting the Index...

In-memoryIndex

Application Cache

Manager

ReliableFilesystem

Performancedepends ondisk seeks

9

Page 45: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Cache sizelimited by

memory size

Revisiting the Index...

In-memoryIndex

Application Cache

Manager

ReliableFilesystem

Performancedepends ondisk seeks

reduce thedependency

9

Page 46: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Cache sizelimited by

memory size

Revisiting the Index...

In-memoryIndex

Application Cache

Manager

ReliableFilesystem

Performancedepends ondisk seeks

reduce thedependency

optimizefor seeks

9

Page 47: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Revisiting the Index...

• Eliminate(?) in-memory index

In-memoryIndex

Application Cache

Manager

ReliableFilesystem

9

Page 48: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Revisiting the Index...

• Eliminate(?) in-memory index

• Need membership and location information

In-memoryIndex

Application Cache

Manager

ReliableFilesystem

9

Page 49: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Revisiting the Index...

• Eliminate(?) in-memory index

• Need membership and location information

• Use disk as hash tableIn-memoryIndex

Application Cache

Manager

ReliableFilesystem

9

Page 50: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Revisiting the Index...

• Eliminate(?) in-memory index

• Need membership and location information

• Use disk as hash table

• On disk data structures for key lookup

• Store the object as values for the keys

In-memoryIndex

Application Cache

Manager

ReliableFilesystem

9

Page 51: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Revisiting the Index...

In-memoryIndex

Application Cache

Manager

ReliableFilesystem

HashCache Engine

9

ApplicationLogic

Application Cache

Manager

ReliableFilesystem

In-memoryIndex

Page 52: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Revisiting the Index...

In-memoryIndex

Application Cache

Manager

ReliableFilesystem

HashCache Engine

9

ApplicationLogic

ReliableFilesystem

In-memoryIndex

Page 53: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Revisiting the Index...

In-memoryIndex

Application Cache

Manager

ReliableFilesystem

HashCache Engine

9

ApplicationLogic

In-memoryIndex

Page 54: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Revisiting the Index...

In-memoryIndex

Application Cache

Manager

ReliableFilesystem

HashCache Engine

9

ApplicationLogic

In-memoryIndex

Page 55: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Revisiting the Index...

In-memoryIndex

Application Cache

Manager

ReliableFilesystem

HashCache Engine

9

ApplicationLogic

In-memoryIndex

Page 56: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

HashCache: Basic Policy

10

Page 58: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

URL

HashCache: Basic Policy

data

10

Page 59: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

URL

HashCache: Basic Policy

hash_value

data

H Bits

10

Page 60: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

URL

HashCache: Basic Policy

hash_value

data

H Bits

N contiguousblocks

Filesystem(Disk Table)

10

Page 61: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

URL

HashCache: Basic Policy

hash_value

% N

t

data

H Bits

N contiguousblocks

Filesystem(Disk Table)

10

Page 62: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

URL

HashCache: Basic Policy

hash_value

% N

t

data

t th block

H Bits

N contiguousblocks

Filesystem(Disk Table)

10

Page 63: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

URL

CircularLog

head

HashCache: Basic Policy

hash_value

% N

t

data

t th block

H Bits

N contiguousblocks

Filesystem(Disk Table)

10

Page 64: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

URL

CircularLog

head

HashCache: Basic Policy

hash_value

% N

t t th block

H Bits

N contiguousblocks

Filesystem(Disk Table)

10

Page 65: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

URL

CircularLog

head

HashCache: Basic Policy

hash_value

% N

t t th block

H Bits

N contiguousblocks

Filesystem(Disk Table)

10

Page 66: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

• Advantages

• No index memory needed

• Tuned for one seek for most objects

HashCache: Basic Policy

10

Page 67: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

• Advantages

• No index memory needed

• Tuned for one seek for most objects

• Disadvantages

• One seek per miss

• No collison control

• No cache replacement policy

HashCache: Basic Policy

10

Page 68: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Collision Control

11

Page 69: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Collision Control

11

Page 70: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Collision Control

Chaining

11

Page 71: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Collision Control

• Does not transition well to disk-based

• Multiple seeks per operation

• Walking hash bin list

• Global replacement policy crosses bins

Chaining

11

Page 72: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Collision Control

Chaining

11

Page 73: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Collision Control

• Fixed locations where each object can be found

• Allocated contiguously, read together

Set AssociativityT-Ways

11

Page 74: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Collision Control

• Fixed locations where each object can be found

• Allocated contiguously, read together

• Seek time dominates short read

Set AssociativityT-Ways

11

Page 75: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Collision Control

• Fixed locations where each object can be found

• Allocated contiguously, read together

• Seek time dominates short read

• Eliminate global cache replacement policiesSet Associativity

T-Ways

11

Page 76: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Reducing Seeks

12

Page 77: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Reducing Seeks

• In-memory hash table

• Too much memory for pointers

Bin Pointers 32

Chaining Pointers 64

Hash 32

Total (bits) 128

In-memory Hash Table

12

Page 78: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Reducing Seeks

• In-memory hash table

• Too much memory for pointers

• Disk is already a hash table

• Pointers not needed

• Large bitmap with the same layout as the disk

Disk Table

12

Page 79: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Reducing Seeks

• In-memory hash table

• Too much memory for pointers

• Disk is already a hash table

• Pointers not needed

• Large bitmap with the same layout as the disk

• Just store hash per URLDisk TableIn-memoryBitmap

H Bits Disk Block

12

Page 80: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Reducing Seeks

• In-memory hash table

• Too much memory for pointers

• Disk is already a hash table

• Pointers not needed

• Large bitmap with the same layout as the disk

• Just store hash per URLDisk TableIn-memoryBitmap

H Bits Disk Block

12

Page 81: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Reducing Seeks

• Original hash of the URL: 64 bits

12

64 OriginalHash

Page 82: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Reducing Seeks

• Original hash of the URL: 64 bits

• Eliminate bits for (same) bin # (228 objs, 8-way, #bins=225 (S))

12

64 OriginalHash

39 64 - log(S)

Page 83: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Reducing Seeks

• Original hash of the URL: 64 bits

• Eliminate bits for (same) bin # (228 objs, 8-way, #bins=225 (S))

• Shrink hash size: Just to eliminate most false positives (8 bits)

12

64 OriginalHash

39 64 - log(S)

8 low FP hash

Page 84: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Cache Replacement

13

64 OriginalHash

39 64 - log(S)

8 low FP hash

Page 85: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Cache Replacement

13

• Large disks: 10-100+ million objects

• Global caching relevant when disk size working set

• When disk >> working set, local policies global policies

!

!

64 OriginalHash

39 64 - log(S)

8 low FP hash

Page 86: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Cache Replacement

13

• Large disks: 10-100+ million objects

• Global caching relevant when disk size working set

• When disk >> working set, local policies global policies

• Local replacement benefits

• 3 bits per URL

• Performed on contiguous objects

• False positives limited by set size

!

!

64 OriginalHash

39 64 - log(S)

8 low FP hash

11 hash + rank

Page 87: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

HashCache: SetMem Policy

14

Page 89: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

HashCache: SetMem Policy

URL

data

14

Page 90: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

HashCache: SetMem Policy

URL

hash_value

% S

t

data

14

Page 91: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

HashCache: SetMem Policy

URL

hash_value

% S

t

data

Filesystem

head

Memory

14

Page 92: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

HashCache: SetMem Policy

URL

hash_value

% S

t

data

Filesystem

head

Memory

11 Bits

tth set

tth set

14

Page 93: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

HashCache: SetMem Policy

URL

hash_value

% S

t

data

Filesystem

head

Memory

11 Bits

tth set

tth set

14

Page 94: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

HashCache: SetMem Policy

URL

hash_value

% S

t

Filesystem

head

Memory

11 Bits

tth set

tth set

14

Page 95: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

HashCache: SetMem Policy

URL

hash_value

% S

t

LRULRU

Filesystem

head

Memory

11 Bits

tth set

tth set

14

Page 96: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

• Advantages

• No seeks for most misses

• 1 seek per read, 1 seek per write

• Good hash + replacement in 11 bits

HashCache: SetMem Policy

14

Page 97: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

• Advantages

• No seeks for most misses

• 1 seek per read, 1 seek per write

• Good hash + replacement in 11 bits

• Disadvantages

• Writes still need seeks

HashCache: SetMem Policy

14

Page 98: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Further Reducing Seeks

Disk Table

15

Page 99: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Further Reducing Seeks

• Storing objects by hash can produce random reads & writes

Disk Table

15

Page 100: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Further Reducing Seeks

• Storing objects by hash can produce random reads & writes

• Restructure on-disk table

Disk Table

15

Page 101: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Further Reducing Seeks

• Storing objects by hash can produce random reads & writes

• Restructure on-disk table

• Store only hash, rank, offset

15

11 hash rank

Page 102: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Further Reducing Seeks

• Storing objects by hash can produce random reads & writes

• Restructure on-disk table

• Store only hash, rank, offset

• Move all data to log

15

11 hash rank

43 hash

rank

offset

Page 103: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Further Reducing Seeks

• Storing objects by hash can produce random reads & writes

• Restructure on-disk table

• Store only hash, rank, offset

• Move all data to log

• Benefits

• Group writes amortize seeks

• Scheduling related writes enables read prefetch

• Both reads & writes < 1 seek

15

11 hash rank

43 hash

rank

offset

Page 104: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Further Reducing Seeks

• Storing objects by hash can produce random reads & writes

• Restructure on-disk table

• Store only hash, rank, offset

• Move all data to log

• Benefits

• Group writes amortize seeks

• Scheduling related writes enables read prefetch

• Both reads & writes < 1 seek

15

0 HC-Basic

43 HC-Log

11 HC-SetMem

Page 105: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Further Reducing Seeks

• Storing objects by hash can produce random reads & writes

• Restructure on-disk table

• Store only hash, rank, offset

• Move all data to log

• Benefits

• Group writes amortize seeks

• Scheduling related writes enables read prefetch

• Both reads & writes < 1 seek

15

560 Squid

232 Tiger

0 HC-Basic

43 HC-Log

11 HC-SetMem

Page 106: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

HashCache: Log Policy

16

Page 107: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

HashCache: Log Policy

URL

16

Page 108: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

HashCache: Log Policy

URL

data

16

Page 109: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

HashCache: Log Policy

URL

hash_value

% S

t

data

16

Page 110: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

HashCache: Log Policy

URL

hash_value

% S

t

data

CircularLog

Filesystem

head

LRU

Memory16

Page 111: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

HashCache: Log Policy

URL

hash_value

% S

t

data

CircularLog

Filesystem

head

LRU

Memory

tth set43 Bits

16

Page 112: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

HashCache: Log Policy

URL

hash_value

% S

t

data

CircularLog

Filesystem

head

LRU

Memory

tth set43 Bits

16

Page 113: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Implementation

17

Page 114: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Implementation

• HashCache Storage Engine with plug-in policies

17

Page 115: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Implementation

• HashCache Storage Engine with plug-in policies

• HashCache Web proxy using storage engine

17

Page 116: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Implementation

• HashCache Storage Engine with plug-in policies

• HashCache Web proxy using storage engine

• Multiple apps on same box, sharing memory

17

Page 117: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Implementation

• HashCache Storage Engine with plug-in policies

• HashCache Web proxy using storage engine

• Multiple apps on same box, sharing memory

• 20,000 lines C code for the proxy and 1000 lines for the indexing policies

17

Page 118: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Implementation

• HashCache Storage Engine with plug-in policies

• HashCache Web proxy using storage engine

• Multiple apps on same box, sharing memory

• 20,000 lines C code for the proxy and 1000 lines for the indexing policies

• Event Driven implementation with non-blocking I/O

17

Page 119: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Implementation

• HashCache Storage Engine with plug-in policies

• HashCache Web proxy using storage engine

• Multiple apps on same box, sharing memory

• 20,000 lines C code for the proxy and 1000 lines for the indexing policies

• Event Driven implementation with non-blocking I/O

• Design similar to that of Flash Web Server. Helpers for I/O and DNS lookups

17

Page 120: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Implementation

• HashCache Storage Engine with plug-in policies

• HashCache Web proxy using storage engine

• Multiple apps on same box, sharing memory

• 20,000 lines C code for the proxy and 1000 lines for the indexing policies

• Event Driven implementation with non-blocking I/O

• Design similar to that of Flash Web Server. Helpers for I/O and DNS lookups

• Balances load across multiple disks easily and makes scaling obvious

17

Page 121: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Evaluation - Web Polygraph

• De-facto feature and performance testing tool for web proxies

• Compare all variants of HashCache with Squid and Tiger

18

Page 122: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Evaluation - Web Polygraph

• De-facto feature and performance testing tool for web proxies

• Compare all variants of HashCache with Squid and Tiger

18

ExperimentName Setting Configuration Comparision

Low End Small School using Laptop

1.4 GHz256 MB

60 GB SATA

HashCache vsSquid vs Tiger

High End ISP with High-End Server

2 GHz3.5 GB

5x18 GB SCSI

HashCache-Log vsSquid vsTiger

Large DiskLarge

School withMini-Tower

1.4 GHz2 GB

2x1TB USB

HashCache-Log vsHashCache-SetMem

Page 123: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Evaluation - Web Polygraph

• De-facto feature and performance testing tool for web proxies

• Compare all variants of HashCache with Squid and Tiger

18

ExperimentName Setting Configuration Comparision

Low End Small School using Laptop

1.4 GHz256 MB

60 GB SATA

HashCache vsSquid vs Tiger

High End ISP with High-End Server

2 GHz3.5 GB

5x18 GB SCSI

HashCache-Log vsSquid vsTiger

Large DiskLarge

School withMini-Tower

1.4 GHz2 GB

2x1TB USB

HashCache-Log vsHashCache-SetMem

Page 124: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Evaluation - Web Polygraph

• De-facto feature and performance testing tool for web proxies

• Compare all variants of HashCache with Squid and Tiger

18

ExperimentName Setting Configuration Comparision

Low End Small School using Laptop

1.4 GHz256 MB

60 GB SATA

HashCache vsSquid vs Tiger

High End ISP with High-End Server

2 GHz3.5 GB

5x18 GB SCSI

HashCache-Log vsSquid vsTiger

Large DiskLarge

School withMini-Tower

1.4 GHz2 GB

2x1TB USB

HashCache-Log vsHashCache-SetMem

Page 125: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Evaluation - Web Polygraph

• De-facto feature and performance testing tool for web proxies

• Compare all variants of HashCache with Squid and Tiger

18

ExperimentName Setting Configuration Comparision

Low End Small School using Laptop

1.4 GHz256 MB

60 GB SATA

HashCache vsSquid vs Tiger

High End ISP with High-End Server

2 GHz3.5 GB

5x18 GB SCSI

HashCache-Log vsSquid vsTiger

Large DiskLarge

School withMini-Tower

1.4 GHz2 GB

2x1TB USB

HashCache-Log vsHashCache-SetMem

Page 126: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

0

15

30

45

60

HC-Basic Squid HC-SetMem HC-Log Tiger

Hit

Rat

e

Max

Hit Rate Comparison

19

Page 127: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Low End Configuration

Page 128: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Low End Configuration

0

75

150

225

300

Perf

orm

ance

req

/sec

HC-Basic Squid HC-SetMem Tiger HC-Log

Page 129: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Low End Configuration

0

75

150

225

300

0% RAM for Index

75% 5%

50%20%

Open Source and Commercialcould index only 18 GB

HashCache couldindex 60 GB

Perf

orm

ance

req

/sec

HC-Basic Squid HC-SetMem Tiger HC-Log

Page 130: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

0

750

1,500

2,250

3,000

Perf

orm

ance

req

/sec

Squid TigerHashCache-Log

High End Configuration

21

5x18 GB Disk

Page 131: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

0

750

1,500

2,250

3,000

Perf

orm

ance

req

/sec

Squid TigerHashCache-Log

40% RAMfor Index

4% 18%

High End Configuration

21

5x18 GB Disk

Page 132: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Index Efficiency

Page 133: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Squid

Tiger

HC-Log

HC-SetMem

0 1,500 3,000 4,500 6,000

Index Efficiency

Max Disk for 1GB RAM

Page 134: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Squid

Tiger

HC-Log

HC-SetMem

0 1,500 3,000 4,500 6,000

Index Efficiency

Max Disk for 1GB RAM

40 reqs/sec

300

300

65

Page 135: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Large Disk Configuration

23

1 TB Disk

0

75

150

225

300

HC-Basic HC-SetMem HC-Log

Perf

orm

ance

req

/sec

Page 136: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Large Disk Configuration

23

1 TB Disk

0

75

150

225

300

HC-Basic HC-SetMem HC-Log

Perf

orm

ance

req

/sec

150 MB

600 MB

0 MBIndex

Page 137: HashCache: Cache Storage for the Next Billion · •HashCache: storage engine w/ plug-in indexing • 6 schemes in paper, 3 shown here • New Web proxy using HashCache engine •

Conclusions & Status

• New Storage Engine & Web Cache

• From no RAM per object to tiny no. of bits/obj

• 6-10x better than Tiger, 20-50x vs Squid

• Enables large disk w/ only laptop-class machine

• More policies, details in paper

• Suitable for developing-world usage

• Current deployments: Ghana, Nigeria

• Working w/ school supplier on new deployments

24