mit consistent hashing: load balancing in a changing world david karger, eric lehman, tom leighton,...
Post on 26-Dec-2015
218 Views
Preview:
TRANSCRIPT
MIT
Consistent Hashing:Load Balancing in a Changing World
David Karger, Eric Lehman,
Tom Leighton, Matt Levine,
Daniel Lewin, Rina Panigrahy
MIT
Load Balancing
Task: distribute items into buckets» Data to memory locations» Files to disks» Tasks to processors» Web pages to caches (our motivation)
Goal: even distribution
MIT
World Wide Web
Servers
Browsers(clients)
LCS
ATTCMU
IBMX
MIT
Hot Spots
LCS
ATTCMU
IBMXTHORCILK
W3C
PDOS
MIT
Temporary Loads
For permanent loads, use bigger server Must also deal with “flash crowds”
» IBM chess match» NASA
Inefficient to design for max load» rarely attained» much capacity wasted
Better to offload peak load elsewhere
MIT
ATT
LCS
Proxy Caches Balance Load
CMU
IBMXTHOR
CILK
W3C
PDOS
MIT
Proxy Caching
Old: server hit once for each browser New: server hit once for each page Adapts to changing access patterns
MIT
Proxy Caching
Every server can also be a cache Provides a social good Reduces load at sites you want to contact Costs you little, if done right
» few accesses» small amount of storage (times many servers)
MIT
Who Caches What?
Each cache should hold few items» else cache gets swamped by clients
Each item should be in few caches» else server gets swamped by caches» and cache invalidations/updates expensive
Browser must know right cache» could ask server for redirect» server gets swamped by redirect requests
MIT
Hashing
Simple and powerful load balancing Constant time to find bucket for item Example map to n buckets. Pick a,b:
y=ax+b (mod n) Intuition: hash maps each item to one
random bucket» no bucket gets many items
MIT
Problem: Adding Caches
Suppose a new cache arrives. How work it into hash function? Natural change:
y=ax+b (mod n+1) Problem: changes bucket for every item
» every cache will be flushed» servers get swamped with new requests
Goal: when add bucket, few items move
MIT
Problem: Inconsistent Views
Each client knows about a different set of caches: its view
View affects choice of cache for item With many views, each cache will be
asked for item Item in all caches---swamps server
Goal: item in few caches despite views
MIT
Problem: Inconsistent Views
LCS
caches0 321
ax+b (mod 4) = 2
My view
MIT
Problem: Inconsistent Views
LCS
caches0 321
ax+b (mod 4) = 2
John’s view
MIT
Problem: Inconsistent Views
LCS
caches0 321
ax+b (mod 4) = 2
Dave’s view
MIT
Problem: Inconsistent Views
LCS
caches0 321
ax+b (mod 4) = 2
Al’s view
MIT
Problem: Inconsistent Views
LCS
caches0 321
ax+b (mod 4) = 2
Mike’s view
MIT
Problem: Inconsistent Views
LCS
caches22222
MIT
Consistent Hashing
A new kind of hash function Maps any item to a bucket in my view Computable in constant time, locally
» 1 standard hash function Adding bucket to view takes logarithmic time
» logarithmic number of standard hash functions Handles incremental and inconsistent views
MIT
Single View Properties
Balance: all buckets get roughly same number of items (like standard hashing)
Smooth: when an kth bucket is added, only a 1/k fraction of the items move» and only from O(log n) servers» minimum needed to preserve balance
MIT
Multiple View Properties
Consider n views, each of an arbitrary constant fraction of the buckets
Load: number of items a bucket gets from all views is O(log n) times average» Despite views, load balanced
Spread: over all views, each item appears in O(log n) buckets» Despite views, few caches for each item
MIT
Implementation
Use standard hash function H to map buckets, items to unit interval» “random” points spread uniformly
BucketItem
MIT
Implementation
Use standard hash function H to map buckets, items to unit interval» “random” points spread uniformly
Item assigned to nearest bucket
BucketItem
MIT
Computation Cost
Bucket positions precomputed To hash item
» compute H» find nearest bucket point
O(log n) time using binary search Constant time with auxiliary hash table
MIT
Balance
Bucket points uniformly distributed by H
Each bucket “owns” equal portion of line Item position “random” by H So item equally likely to be at any bucket So all buckets get about same # items
Bucket
MIT
Smoothness
To add a kth bucket, hash it to line
Old bucket
MIT
Smoothness
To add a kth bucket, hash it to line
New bucketOld bucket
MIT
Smoothness
To add a kth bucket, hash it to line
Captures items nearest to it» only 1/k fraction of total items
» only from 2 other buckets (on each side)
New bucketOld bucket
MIT
Low Spread
Some views might not see nearest bucket to an item, hash it elsewhere
But every view will have a bucket near the item (by “random” placement)
So only buckets near the item will ever have to hold it
But only a few buckets are near the item (by “random” placement)
MIT
Low Load
Bucket only gets item I if no other bucket is closer to I
Under any view, some bucket is close to I» by “random” placement of buckets
So a bucket only gets items close to it But an item is unlikely to be close So bucket doesn’t get many items
MIT
Summary: Consistent Hashing
Trivial to implement (20 lines code)» don’t try this at home!
Fast to compute (no hidden constants) Uniformly distributes items Can cheaply add/remove buckets Even with multiple views
» no bucket gets many items» each item in only a few buckets
MIT
Caching
Consistent hashing good for caching» client maps known caches to unit interval» when item arrives, hash to cache» server gets O(log n) requests for its own pages
Each server can also be a cache» gets small number of requests for others’ pages
Consistent hashing is robust» caches can come and go» different browsers can know different caches
MIT
Refinements
Browser bandwidth to machines may vary If bandwidth to server high, unwilling to use
low bandwidth cache Consistently hash item only to caches with
bandwidth as good as server Theorem: all previous properties still hold
» uniform cache loads» low server loads (few caches per item)
MIT
Fault Tolerance
Suppose contacted cache is down Delete from bucket set, find next closest
bucket in consistent hashing interval Just a small change in view… Even with many failures, previous
properties (uniform low load, etc) still hold.
MIT
Hot pages
What if one page gets popular?» cache responsible for it gets swamped
Use tree of caches (e.g. harvest, DNS)?» cache at root gets swamped
Use a different tree for each page» build using consistent hashing
Balances load for hot pages and servers
MIT
Main Result
Using cache trees of logarithmic depth, for any set of page accesses, can adaptively balance load such that every server gets at most log times average load of system (browser/server ratio).
(assorted theory caveats)
MIT
Conclusion
Consistent hashing balances load Simple enough to be practical (?) We’d like to build distributed web cache
» need help of good systems student Looking for other applications
» DNS lookup? » PICS? URN resolution?» NOWs? Distributed file systems?
top related