efficient(mrc(construc0on( with(shards( - usenix · 2019. 12. 18. · mrc(algorithm(research(1965...
TRANSCRIPT
Efficient MRC Construc0on with SHARDS
Carl Waldspurger Nohhyun Park Alexander Garthwaite Irfan Ahmad
USENIX Conference on File and Storage Technologies February 17, 2015
CloudPhysics, Inc.
Mo0va0on
• Cache performance highly non-‐linear • Benefit varies widely by workload • Opportunity: dynamic cache management
– Efficient sizing, allocaSon, and scheduling – Improve performance, isolaSon, QoS
• Problem: online modeling expensive – Too resource-‐intensive to be broadly pracScal – Exacerbated by increasing cache sizes
CloudPhysics, Inc. USENIX FAST '15 2
Modeling Cache Performance
• Miss RaSo Curve (MRC) – Performance as f (size) – Working set knees – Inform allocaSon policy
• Reuse distance – Unique intervening blocks between use and reuse
– LRU, stack algorithms ���
���
���
���
���
�� �� ��� ��� ���
����������
���������������
USENIX FAST '15 3 CloudPhysics, Inc.
MRC Algorithm Research
1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 1965
separate simulation per cache size
Bennett & Kruskal balanced tree
O(N), O(N log N)
Olken tree of unique refs
O(M), O(N log M)
SHARDS spatial hashing
Counter Stacks probabilistic counters
O(1), O(N)
O(log M), O(N log M)
PARDA parallelism
UMON-DSS hw set sampling
RapidMRC on-off periods
Kessler, Hill & Wood set, time sampling
Bryan & Conte cluster sampling
USENIX FAST '15 4
Mattson Stack Algorithm single pass
O(M), O(NM)
Space, Time Complexity N = total refs, M = unique refs
CloudPhysics, Inc.
Key Idea
• Track only a small subset of blocks – Filter input to exisSng algorithm – Run full algorithm, using only sampled blocks – Cheap/accurate enough for pracScal online MRCs?
• SHARDS approximaSon algorithm – Randomized spaSal sampling – Uses hashing to capture all reuses of same block – High performance in Sny constant footprint – Surprisingly accurate MRCs
USENIX FAST '15 5 CloudPhysics, Inc.
Spa0ally Hashed Sampling
USENIX FAST '15 6
Li
hash(Li) mod P
Ti
randomize
process
skip
no
yes
sample?
< T
CloudPhysics, Inc.
subset inclusion property maintained as R is lowered T
adjustable threshold
0 P
sampled unsampled sampling rate R = T / P
Basic SHARDS
USENIX FAST '15 7
Each sample staSsScally represents 1 /R blocks Scale up reuse distances by same factor
CloudPhysics, Inc.
randomize sample?
Li
hash(Li) mod P
Ti < T yes
compute distance
Standard Reuse Distance
Algorithm
scale up
÷ R
skip
no
SHARDS in Constant Space
USENIX FAST '15 8
randomize scale up sample? compute distance
Li
hash(Li) mod P
Ti < T yes Standard
Reuse Distance Algorithm
÷ R
sample set
CloudPhysics, Inc.
evict samples to bound set size
lower threshold T = Tmax reduces rate R = T / P
T 0 P
Tmax
Example SHARDS MRCs
• Block I/O trace t04 – ProducSon VM disk – 69.5M refs, 5.2M unique
• Sample size smax – Vary from 128 to 32K – smax ≥ 2K very accurate
• Small constant footprint • SHARDSadj adjustment
USENIX FAST '15 9
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
0 20 40 60 80Cache Size (GB)
Mis
s R
atio
Sample Size (smax)
Exact MRC32K8K2K512128
CloudPhysics, Inc.
Dynamic Rate Adapta0on
• Adjust sampling rate – Start with R = 0.1 – Lower R as M increases – Shape depends on trace
• Rescale histogram counts – Discount evicted samples – Correct relaSve weighSng – Scale by Rnew / Rold
USENIX FAST '15 10
�������
������
�����
����
�� ��� ��� ��� ��� ���
�����������������
���������������������
������������
CloudPhysics, Inc.
Experimental Evalua0on
• Data collecSon – SaaS caching analyScs – Remotely stream VMware vscsiStats
• 124 trace files – 106 week-‐long traces CloudPhysics customers
– 12 MSR and 6 FIU traces SNIA IOTTA
• LRU, 16 KB block size
USENIX FAST '15 11 CloudPhysics, Inc.
Exact MRCs vs. SHARDS
USENIX FAST '15 12
msr_mds (1.10%) msr_proj (0.06%) msr_src1 (0.06%) t01 (0.05%)
t06 (0.33%) t08 (0.04%) t14 (0.38%) t15 (0.10%)
t18 (0.08%) t19 (0.06%) t30 (0.06%) t32 (0.98%)
0.0
0.5
1.0
0.0
0.5
1.0
0.0
0.5
1.0
0 40 80 0 500 1000 0 200 0 200 400
0 50 100 0 300 600 0 100 200 300 0 200 400
0 100 200 0 200 400 0 100 200 300 0 9 18Cache Size (GB)
Mis
s R
atio
smax = 8K exact MRC
CloudPhysics, Inc.
Error Analysis
• Mean Absolute Error (MAE) – | exact – approx | – Average over all cache sizes
• Full set of 124 traces • Error ∝ 1 / √smax
• MAE for smax = 8K – 0.0027 median – 0.0171 worst-‐case
USENIX FAST '15 13
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●●●●●●●●●●●●●●●●●●
●
●● ●
●
●●●●●●●
●●●●●●●●●
●●●●●●●●
●●●
●●●●
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.10
256 512 1K 2K 4K 8K 16K 32KSample Size (smax)
Mea
n A
bsol
ute
Erro
r (M
AE)
CloudPhysics, Inc.
Memory Footprint
0.1
1
10
100
1000
10000
100000
0 20 40 60 80 100 120
Mem
ory
Usa
ge (M
B)
Trace Number
Baseline (unsampled) SHARDS R = 0.010 SHARDS R = 0.001 SHARDS Smax = 8K
• Full set of 124 traces • SequenSal PARDA • Basic SHARDS
– Modified PARDA – Memory ≈ R × baseline for larger traces
• Fixed-‐size SHARDS – New space-‐efficient code – Constant 1 MB footprint
USENIX FAST '15 14 CloudPhysics, Inc.
Processing Time
0.01
0.1
1
10
100
1000
10000
100000
0 20 40 60 80 100 120
CPU
Usa
ge (s
ec)
Trace Number
Baseline (unsampled) SHARDS R = 0.010 SHARDS R = 0.001 SHARDS Smax = 8K
• Full set of 124 traces • SequenSal PARDA • Basic SHARDS
– Modified PARDA – R=0.001 speedup 41–1029×
• Fixed-‐size SHARDS – New space-‐efficient code – Overhead for evicSons – Smax= 8K speedup 6–204×
USENIX FAST '15 15 CloudPhysics, Inc.
Counter Stacks Comparison Algorithm Memory
(MB) Throughput (Mrefs/sec)
Error (MAE)
Counter Stacks 80.0 2.3 0.0025 SHARDS Smax=32K 2.0 16.9 0.0026 SHARDS Smax=8K 1.3 17.6 0.0061
• QuanStaSve – Same merged MSR “master” trace – Counter Stacks roughly 7× slower, 40–62× bigger
• QualitaSve – Counter Stacks checkpoints support splicing/merging – SHARDS maintains block ids, generalizes to non-‐LRU
USENIX FAST '15 16 CloudPhysics, Inc.
Generalizing to Non-‐LRU Policies
• Many sophisScated replacement policies – ARC, LIRS, CAR, CLOCK-‐Pro, … – AdapSve, frequency and recency – No known single-‐pass MRC methods!
• SoluSon: efficient scaled-‐down simulaSon – Filter using spaSally hashed sampling – Scale down simulated cache size by sampling rate – Run full simulaSon at each cache size
• Surprisingly accurate results USENIX FAST '15 17 CloudPhysics, Inc.
Scaled-‐Down Simula0on Examples
ARC — MSR-‐Web Trace CLOCK-‐Pro — Trace t04
USENIX FAST '15 18
��
����
����
����
����
��
�� ��� ��� ��� ��� ��� ��� ��� ���
����������
���������������
������������������������������
���������
��
����
����
����
����
����
����
����
����
�� ��� ��� ��� ��� ��� ��� ��� ���
����������
���������������
���������������������������������������������
CloudPhysics, Inc.
Conclusions
• New SHARDS algorithm – Approximate MRC in O(1) space, O(N) Sme – Excellent accuracy in 1 MB footprint
• PracScal online MRCs – Even for memory-‐constrained drivers, firmware – So lightweight, can run mulSple instances
• Scaled-‐down simulaSon of non-‐LRU policies
USENIX FAST '15 19 CloudPhysics, Inc.
Ques0ons?
• {carl,nohhyun,alex,irfan}@cloudphysics.com • Visit our poster • BoF 9-‐10pm tonight in Bayshore West
CloudPhysics, Inc. USENIX FAST '15 20
• PotenSal academic and industry collaboraSon • ApplicaSon areas include capacity planning, dynamic parSSoning, tuning, policies, …
• We’re also hiring!