efficient(mrc(construc0on( with(shards( - usenix · 2019. 12. 18. · mrc(algorithm(research(1965...

20
Efficient MRC Construc0on with SHARDS Carl Waldspurger Nohhyun Park Alexander Garthwaite Irfan Ahmad USENIX Conference on File and Storage Technologies February 17, 2015 CloudPhysics, Inc.

Upload: others

Post on 11-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Efficient(MRC(Construc0on( with(SHARDS( - USENIX · 2019. 12. 18. · MRC(Algorithm(Research(1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 separate simulation per cache size

Efficient  MRC  Construc0on  with  SHARDS  

Carl  Waldspurger          Nohhyun  Park  Alexander  Garthwaite          Irfan  Ahmad  

USENIX  Conference  on  File  and  Storage  Technologies  February  17,  2015  

CloudPhysics,  Inc.  

Page 2: Efficient(MRC(Construc0on( with(SHARDS( - USENIX · 2019. 12. 18. · MRC(Algorithm(Research(1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 separate simulation per cache size

Mo0va0on  

•  Cache  performance  highly  non-­‐linear  •  Benefit  varies  widely  by  workload  •  Opportunity:  dynamic  cache  management  

– Efficient  sizing,  allocaSon,  and  scheduling  –  Improve  performance,  isolaSon,  QoS  

•  Problem:  online  modeling  expensive  – Too  resource-­‐intensive  to  be  broadly  pracScal  – Exacerbated  by  increasing  cache  sizes  

CloudPhysics,  Inc.   USENIX  FAST  '15   2  

Page 3: Efficient(MRC(Construc0on( with(SHARDS( - USENIX · 2019. 12. 18. · MRC(Algorithm(Research(1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 separate simulation per cache size

Modeling  Cache  Performance  

•  Miss  RaSo  Curve  (MRC)  –  Performance  as  f (size)  – Working  set  knees  –  Inform  allocaSon  policy  

•  Reuse  distance  –  Unique  intervening  blocks  between  use  and  reuse  

–  LRU,  stack  algorithms  ���

���

���

���

���

�� �� ��� ��� ���

����������

���������������

USENIX  FAST  '15   3  CloudPhysics,  Inc.  

Page 4: Efficient(MRC(Construc0on( with(SHARDS( - USENIX · 2019. 12. 18. · MRC(Algorithm(Research(1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 separate simulation per cache size

MRC  Algorithm  Research  

1970   1975   1980   1985   1990   1995   2000   2005   2010   2015  1965  

separate simulation per cache size

Bennett & Kruskal balanced tree

O(N), O(N log N)

Olken tree of unique refs

O(M), O(N log M)

SHARDS spatial hashing

Counter Stacks probabilistic counters

O(1), O(N)

O(log M), O(N log M)

PARDA parallelism

UMON-DSS hw set sampling

RapidMRC on-off periods

Kessler, Hill & Wood set, time sampling

Bryan & Conte cluster sampling

USENIX  FAST  '15   4  

Mattson Stack Algorithm single pass

O(M), O(NM)

Space, Time Complexity N = total refs, M = unique refs

CloudPhysics,  Inc.  

Page 5: Efficient(MRC(Construc0on( with(SHARDS( - USENIX · 2019. 12. 18. · MRC(Algorithm(Research(1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 separate simulation per cache size

Key  Idea  

•  Track  only  a  small  subset  of  blocks  –  Filter  input  to  exisSng  algorithm  –  Run  full  algorithm,  using  only  sampled  blocks  –  Cheap/accurate  enough  for  pracScal  online  MRCs?  

•  SHARDS  approximaSon  algorithm  –  Randomized  spaSal  sampling  – Uses  hashing  to  capture  all  reuses  of  same  block  – High  performance  in  Sny  constant  footprint  –  Surprisingly  accurate  MRCs  

USENIX  FAST  '15   5  CloudPhysics,  Inc.  

Page 6: Efficient(MRC(Construc0on( with(SHARDS( - USENIX · 2019. 12. 18. · MRC(Algorithm(Research(1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 separate simulation per cache size

Spa0ally  Hashed  Sampling  

USENIX  FAST  '15   6  

Li  

hash(Li)  mod  P  

Ti  

randomize  

process  

skip  

no  

yes  

sample?  

<  T  

CloudPhysics,  Inc.  

subset  inclusion  property        maintained  as  R  is  lowered  T  

adjustable  threshold  

0   P  

sampled   unsampled  sampling  rate  R  =  T  /  P  

Page 7: Efficient(MRC(Construc0on( with(SHARDS( - USENIX · 2019. 12. 18. · MRC(Algorithm(Research(1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 separate simulation per cache size

Basic  SHARDS  

USENIX  FAST  '15   7  

Each  sample  staSsScally  represents  1 /R  blocks  Scale  up  reuse  distances  by  same  factor    

CloudPhysics,  Inc.  

randomize   sample?  

Li  

hash(Li)  mod  P  

Ti   <  T  yes  

compute  distance  

Standard  Reuse  Distance  

Algorithm  

scale  up  

 ÷  R  

skip  

no  

Page 8: Efficient(MRC(Construc0on( with(SHARDS( - USENIX · 2019. 12. 18. · MRC(Algorithm(Research(1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 separate simulation per cache size

SHARDS  in  Constant  Space  

USENIX  FAST  '15   8  

randomize   scale  up  sample?   compute  distance  

Li  

hash(Li)  mod  P  

Ti   <  T  yes   Standard  

Reuse  Distance  Algorithm  

 ÷  R  

sample  set  

CloudPhysics,  Inc.  

evict  samples  to  bound  set  size  

lower  threshold  T  =  Tmax  reduces  rate  R  =  T  /  P  

T  0   P  

Tmax  

Page 9: Efficient(MRC(Construc0on( with(SHARDS( - USENIX · 2019. 12. 18. · MRC(Algorithm(Research(1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 separate simulation per cache size

Example  SHARDS  MRCs  

•  Block  I/O  trace  t04  –  ProducSon  VM  disk  –  69.5M  refs,  5.2M  unique  

•  Sample  size  smax  –  Vary  from  128  to  32K  –  smax  ≥  2K  very  accurate  

•  Small  constant  footprint  •  SHARDSadj  adjustment  

USENIX  FAST  '15   9  

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

0 20 40 60 80Cache Size (GB)

Mis

s R

atio

Sample Size (smax)

Exact MRC32K8K2K512128

CloudPhysics,  Inc.  

Page 10: Efficient(MRC(Construc0on( with(SHARDS( - USENIX · 2019. 12. 18. · MRC(Algorithm(Research(1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 separate simulation per cache size

Dynamic  Rate  Adapta0on  

•  Adjust  sampling  rate  –  Start  with  R  =  0.1  –  Lower  R  as  M  increases  –  Shape  depends  on  trace  

•  Rescale  histogram  counts  –  Discount  evicted  samples  –  Correct  relaSve  weighSng  –  Scale  by  Rnew  /  Rold  

 

USENIX  FAST  '15   10  

�������

������

�����

����

�� ��� ��� ��� ��� ���

�����������������

���������������������

������������

CloudPhysics,  Inc.  

Page 11: Efficient(MRC(Construc0on( with(SHARDS( - USENIX · 2019. 12. 18. · MRC(Algorithm(Research(1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 separate simulation per cache size

Experimental  Evalua0on  

•  Data  collecSon  –  SaaS  caching  analyScs  –  Remotely  stream  VMware  vscsiStats  

•  124  trace  files  –  106  week-­‐long  traces    CloudPhysics  customers  

–  12  MSR  and  6  FIU  traces  SNIA  IOTTA  

•  LRU,  16  KB  block  size  

USENIX  FAST  '15   11  CloudPhysics,  Inc.  

Page 12: Efficient(MRC(Construc0on( with(SHARDS( - USENIX · 2019. 12. 18. · MRC(Algorithm(Research(1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 separate simulation per cache size

Exact  MRCs  vs.  SHARDS  

USENIX  FAST  '15   12  

msr_mds (1.10%) msr_proj (0.06%) msr_src1 (0.06%) t01 (0.05%)

t06 (0.33%) t08 (0.04%) t14 (0.38%) t15 (0.10%)

t18 (0.08%) t19 (0.06%) t30 (0.06%) t32 (0.98%)

0.0

0.5

1.0

0.0

0.5

1.0

0.0

0.5

1.0

0 40 80 0 500 1000 0 200 0 200 400

0 50 100 0 300 600 0 100 200 300 0 200 400

0 100 200 0 200 400 0 100 200 300 0 9 18Cache Size (GB)

Mis

s R

atio

smax = 8K exact MRC

CloudPhysics,  Inc.  

Page 13: Efficient(MRC(Construc0on( with(SHARDS( - USENIX · 2019. 12. 18. · MRC(Algorithm(Research(1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 separate simulation per cache size

Error  Analysis  

•  Mean  Absolute  Error  (MAE)  –  |  exact  –  approx  |    –  Average  over  all  cache  sizes  

•  Full  set  of  124  traces  •  Error  ∝ 1  /  √smax  

•  MAE  for  smax  =  8K  –  0.0027  median  –  0.0171  worst-­‐case  

USENIX  FAST  '15   13  

●●

●●

●●●●●●●●●●●●●●●●●●●

●● ●

●●●●●●●

●●●●●●●●●

●●●●●●●●

●●●

●●●●

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.10

256 512 1K 2K 4K 8K 16K 32KSample Size (smax)

Mea

n A

bsol

ute

Erro

r (M

AE)

CloudPhysics,  Inc.  

Page 14: Efficient(MRC(Construc0on( with(SHARDS( - USENIX · 2019. 12. 18. · MRC(Algorithm(Research(1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 separate simulation per cache size

Memory  Footprint  

0.1

1

10

100

1000

10000

100000

0 20 40 60 80 100 120

Mem

ory

Usa

ge (M

B)

Trace Number

Baseline (unsampled) SHARDS R = 0.010 SHARDS R = 0.001 SHARDS Smax = 8K

•  Full  set  of  124  traces  •  SequenSal  PARDA  •  Basic  SHARDS  

– Modified  PARDA  – Memory  ≈  R  ×  baseline  for  larger  traces  

•  Fixed-­‐size  SHARDS  –  New  space-­‐efficient  code  –  Constant  1  MB  footprint  

USENIX  FAST  '15   14  CloudPhysics,  Inc.  

Page 15: Efficient(MRC(Construc0on( with(SHARDS( - USENIX · 2019. 12. 18. · MRC(Algorithm(Research(1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 separate simulation per cache size

Processing  Time  

0.01

0.1

1

10

100

1000

10000

100000

0 20 40 60 80 100 120

CPU

Usa

ge (s

ec)

Trace Number

Baseline (unsampled) SHARDS R = 0.010 SHARDS R = 0.001 SHARDS Smax = 8K

•  Full  set  of  124  traces  •  SequenSal  PARDA  •  Basic  SHARDS  

– Modified  PARDA  –  R=0.001  speedup  41–1029×    

•  Fixed-­‐size  SHARDS  –  New  space-­‐efficient  code  –  Overhead  for  evicSons  –  Smax=  8K  speedup  6–204×    

USENIX  FAST  '15   15  CloudPhysics,  Inc.  

Page 16: Efficient(MRC(Construc0on( with(SHARDS( - USENIX · 2019. 12. 18. · MRC(Algorithm(Research(1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 separate simulation per cache size

Counter  Stacks  Comparison  Algorithm   Memory    

(MB)  Throughput    (Mrefs/sec)  

Error  (MAE)  

Counter  Stacks   80.0   2.3   0.0025  SHARDS  Smax=32K   2.0   16.9   0.0026  SHARDS  Smax=8K   1.3   17.6   0.0061  

•  QuanStaSve  –  Same  merged  MSR  “master”  trace  –  Counter  Stacks  roughly  7×  slower,  40–62×  bigger  

•  QualitaSve  –  Counter  Stacks  checkpoints  support  splicing/merging  –  SHARDS  maintains  block  ids,  generalizes  to  non-­‐LRU  

USENIX  FAST  '15   16  CloudPhysics,  Inc.  

Page 17: Efficient(MRC(Construc0on( with(SHARDS( - USENIX · 2019. 12. 18. · MRC(Algorithm(Research(1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 separate simulation per cache size

Generalizing  to  Non-­‐LRU  Policies  

•  Many  sophisScated  replacement  policies  – ARC,  LIRS,  CAR,  CLOCK-­‐Pro,  …  – AdapSve,  frequency  and  recency  – No  known  single-­‐pass  MRC  methods!  

•  SoluSon:  efficient  scaled-­‐down  simulaSon  – Filter  using  spaSally  hashed  sampling  – Scale  down  simulated  cache  size  by  sampling  rate  – Run  full  simulaSon  at  each  cache  size  

•  Surprisingly  accurate  results  USENIX  FAST  '15   17  CloudPhysics,  Inc.  

Page 18: Efficient(MRC(Construc0on( with(SHARDS( - USENIX · 2019. 12. 18. · MRC(Algorithm(Research(1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 separate simulation per cache size

Scaled-­‐Down  Simula0on  Examples  

ARC  —  MSR-­‐Web  Trace   CLOCK-­‐Pro  —  Trace  t04  

USENIX  FAST  '15   18  

��

����

����

����

����

��

�� ��� ��� ��� ��� ��� ��� ��� ���

����������

���������������

������������������������������

���������

��

����

����

����

����

����

����

����

����

�� ��� ��� ��� ��� ��� ��� ��� ���

����������

���������������

���������������������������������������������

CloudPhysics,  Inc.  

Page 19: Efficient(MRC(Construc0on( with(SHARDS( - USENIX · 2019. 12. 18. · MRC(Algorithm(Research(1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 separate simulation per cache size

Conclusions  

•  New  SHARDS  algorithm  – Approximate  MRC  in  O(1)  space,  O(N)  Sme  – Excellent  accuracy  in  1  MB  footprint  

•  PracScal  online  MRCs  – Even  for  memory-­‐constrained  drivers,  firmware  – So  lightweight,  can  run  mulSple  instances  

•  Scaled-­‐down  simulaSon  of  non-­‐LRU  policies  

USENIX  FAST  '15   19  CloudPhysics,  Inc.  

Page 20: Efficient(MRC(Construc0on( with(SHARDS( - USENIX · 2019. 12. 18. · MRC(Algorithm(Research(1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 separate simulation per cache size

Ques0ons?  

•  {carl,nohhyun,alex,irfan}@cloudphysics.com  •  Visit  our  poster  •  BoF  9-­‐10pm  tonight  in  Bayshore  West  

CloudPhysics,  Inc.   USENIX  FAST  '15   20  

•  PotenSal  academic  and  industry  collaboraSon  •  ApplicaSon  areas  include  capacity  planning,  dynamic  parSSoning,  tuning,  policies,  …  

•  We’re  also  hiring!