ymir vigfusson, emory university hjortur bjornsson university of iceland ymir vigfusson emory...

35
Dynamic Performance Profiling of Data Caches Ymir Vigfusson, Emory University Hjortur Bjornsson University of Iceland Ymir Vigfusson Emory University / Reykjavik University Trausti Saemundsson Reykjavik University Gregory Chockler University of London, Royal Holloway

Upload: alan-scaff

Post on 15-Dec-2015

233 views

Category:

Documents


3 download

TRANSCRIPT

  • Slide 1

Ymir Vigfusson, Emory University Hjortur Bjornsson University of Iceland Ymir Vigfusson Emory University / Reykjavik University Trausti Saemundsson Reykjavik University Gregory Chockler University of London, Royal Holloway Slide 2 Queries Results Queries Results Clients Memcache servers Database tier Slide 3 Queries Results Queries Results Clients Memcache servers Database tier Too many cache servers is a waste of resources Slide 4 Queries Results Queries Clients Memcache servers Database tier Results DRAMATIZATION Results Queries Too few cache servers overload the database Slide 5 How do we optimize cache resources? Slide 6 The key parameter is the cache size Hit rate for the current allocation Slide 7 How do we optimize cache resources? Efficiency Time overheadSpace overhead Accuracy High fidelity to true hit rate curves Provable guarantees Usability Simple interfaceModularity Slide 8 M IMIR Cache server HRC estimator Ghost list/ghost filter Hit(e) Miss(e) Set(e) Evict(e) Replacement algorithm Get/Set(e ) Aging policy Export-HRC() Slide 9 Inclusion property: Contents of a smaller cache are contained in a bigger cache given same input Holds for LRU, LFU, OPT,... LRU list Idea: Produce LRU hit rate curves by tracking distance from head Stack distance Slide 10 ghufertb On every hit, determine stack distance Accumulate in a Hit Rate Curve PDF ghufertb Stack distance # of hits eghufrtbeghufrtbteghufrb Mattson et al. (1970) Walking a linked list on every access is inefficient Slide 11 Bennett & Kruskal (1975) Almasi et al. (2002) ghufertb Could use self-balancing binary search trees AVL tree, red-black trees,... log N Trees accentuate lock contention, hurting performance Slide 12 Prior focus on exact results Prior focus on very high stack distances Can we trade off accuracy for efficiency? Slide 13 Extend LRU list with dataless ghost entries 2N N Slide 14 Slide 15 LRU list before hit on item e B Buckets g,h,ue,r,tfb,c,d,a Bucket #0 1 2 3 Each item tracks what bucket it is in Slide 16 PDF update Hit rate curve (PDF) Stack distance Estimated # of hits B Buckets g,h,ue,r,tfb,c,d,a Bucket #0 1 2 3 Update statistics for es bucket Unit area uniformly distributed in bucket Slide 17 e,g,h,ur,tfb,c,d,a PDF update Hit rate curve (PDF) Stack distance Estimated # of hits g,h,ue,r,tfb,c,d,a Bucket #0 1 2 3 Move e to front, tag with first bucket Unit area uniformly distributed in bucket Overflow looming Slide 18 e,g,h,ur,tfb,c,d,ag,h,ufr,t,b,c,d,ae g,h,ue,r,tfb,c,d,a Bucket #0 1 2 3 When first bucket full, perform aging Overflow looming Stacker: Walk list, bump items below average reuse distance to an older bucket on the right. Rounder: Decrement bucket identifiers, shifting the frame of reference. Coalesce two oldest buckets. O(B) amortized O(1) Slide 19 Hit rate curve (PDF) Stack distance Estimated # of hits Periodically calculate and export the hit rate curve Exponentially average the PDF Hit rate curve (CDF) Stack distance Cumulative hits Slide 20 Maintain ghost list of length N Hit on item e: Update PDF statistics for es bucket Move e to front of list, tag with first bucket When front bucket full, perform aging Periodically calculate and export HRC O(B) or O(1) O(1) O(log B) Slide 21 ROUNDER STACKER 90% Accuracy: 98 - 99.8% Slide 22 Slide 23 Hit rate curve (PDF) Stack distance Estimated # of hits OPT True stack distance Possible location ALG Unit area Slide 24 Memcached + YCSB #BucketsMAE 81.2% 160.6% 320.4% 640.3% 1280.3% x 10 Each node has 6 Intel Xeon quad-core @ 2.4GHz; 48GB DRAM; 40Gbps QDR Infiniband interconnect; shared storage Slide 25 Memcached + YCSB #BucketsMAE 81.2% 160.6% 320.4% 640.3% 1280.3% 2-5% throughput and latency degradation Slide 26 PaperAreaKey ideaOnlinePrecisionParallel Mattson et al. 1970Storage Stack distance, LRU linear search NoExactNo Kim et al. OSDI00V-MemoryNoApproxNo Almsi et al. MSP02Storage?LRU AVL-TreeNoExactNo Ding&Zhong PLDI03CompilersCompressed treesNoApprox.No Zhou et al. ASPLOS04V-MemoryNo Geiger ASPLOS06V-Memory Infer page fault from I/O, ghost lists NoApprox.No RapidMRC ASPLOS09V-MemoryFixed LRU bucketsNoApprox.No Hwang&Wood ICAC13 Network caches Hash rebalancingYesN/A Wires et al. OSDI 14StorageLRU counter stacksNo*Approx.No M IMIR SOCC 14 Network caches Variable LRU buckets YesApprox.Yes Slide 27 Optimizing cache resources by profiling hit rates online Efficiency 2-5% performance degradation Accuracy 98-99.8% on traces Usability M IMIR is modularSimple algorithms Slide 28 Slide 29 Slide 30 Optimizing cache resources by profiling hit rates online Efficiency Time: O(log B) Space: O(1) 0-2% throughput degradation Scalability Online profiling Composable estimate Accuracy 98-99.8% on tracesError = O(1/B) Can we do cost-benefit analysis of other cloud services? What should be evicted from a distributed cache? How should variable miss penalties be treated? Where should we place data in a cache hierarchy? Can we relieve cache hot spots? Can we do cost-benefit analysis of other cloud services? What should be evicted from a distributed cache? How should variable miss penalties be treated? Where should we place data in a cache hierarchy? Can we relieve cache hot spots? Slide 31 Divide cache resources between services Progressively allocate space to service with highest marginal hit rate Optimal for concave HRC N [IBM J. of R&D 11, LADIS 11] Slide 32 Slide 33 Slide 34 Slide 35