bobtail:)avoiding)long)tails)in) the)cloud)) · use of the boeing logo and name the boeing logo and...

27
Bobtail: Avoiding Long Tails in the Cloud Yunjing Xu, Zachary Musgrave, Brian Noble, Michael Bailey University of Michigan 1

Upload: others

Post on 14-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

Bobtail:  Avoiding  Long  Tails  in  the  Cloud    

Yunjing Xu, Zachary Musgrave, Brian Noble, Michael Bailey

University  of  Michigan    

1  

Page 2: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

Public  clouds  are  used  everywhere    

Craig  Labovitz,  Deep  Field,  April  2012  

Amazon’s  Cloud  

2  

Page 3: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

Challenges  to  large  applicaHons  

•  User-­‐facing  applicaHons  have  Hght  deadlines  – E.g.,  300ms  for  end-­‐to-­‐end  response  Hme  

•  Large  applicaHons  have  more  moving  parts  – A  single  Amazon  page  requires  over  150  services  – 10ms  budget  for  individual  services  

•  The  tail  latency  maOers,  not  just  the  average  – Easy  to  miss  deadline  if  any  service  becomes  slow  – More  criHcal  as  you  scale  out  further  

3  

Page 4: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

Agenda  

•  EC2  has  long  tail  latency  problem  – Much  worse  than  that  in  dedicated  data  centers  – A  problem  of  the  node  not  the  network  

•  Root  cause:  incompaHble  sharing  – Co-­‐scheduling  of  I/O-­‐bound  and  CPU-­‐bound  VMs  

•  SoluHon:  Bobtail  – Pick  VMs  without  long  tails  – A  customer-­‐centric  approach  

4  

Page 5: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

Agenda  

•  EC2  has  long  tail  latency  problem  – Much  worse  than  that  in  dedicated  data  centers  – A  problem  of  the  node  not  the  network  

•  Root  cause:  incompaHble  sharing  – Co-­‐scheduling  of  I/O-­‐bound  and  CPU-­‐bound  VMs  

•  SoluHon:  Bobtail  – Pick  VMs  without  long  tails  – A  customer-­‐centric  approach  

5  

Page 6: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

Measurements  Setup  

•  Live  measurement  in  EC2’s  east  region  •  Use  ThriY  RPC  framework  to  measure  RTT  

•  EXP1:  RTT  for  60  RPC  servers  – Measure  from  designated  tesHng  nodes  – Give  an  overview  of  the  problem  

•  EXP2:  Pair-­‐wise  RTT  for  16  RPC  servers  – Measure  between  each  other  – Study  spaHal  properHes  

6  

Page 7: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

How  bad  is  the  latency  tail?  

7  

99.9th  ~30ms  

Page 8: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

How  bad  is  the  latency  tail?  

8  

99.9th  ~30ms  

Median  ~0.6ms  

Over  10X  larger  

Page 9: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

How  does  it  compare  to  dedicated  DCs?  

9  

99.9th  ~30ms  

   99.9th  ~13.5ms  

Over  2  Hmes  larger  

Measured  in  dedicated  DC[1,29]  

Page 10: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

SpaHal  ProperHes  

10  

99.9th  (ms)  

A  property  of  nodes  not  the  network  

Depends  only  on  servers  

Client  

Server  

Page 11: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

Agenda  

•  EC2  has  long  tail  latency  problem  – Worse  than  that  in  dedicated  data  centers  – A  problem  of  the  node  not  the  network  

•  Root  cause:  incompaHble  sharing  – Co-­‐scheduling  of  I/O-­‐bound  and  CPU-­‐bound  VMs  

•  SoluHon:  Bobtail  – Pick  VMs  without  long  tails  – A  customer-­‐centric  approach  

11  

Page 12: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

Root  cause  analysis  

•  VirtualizaHon  and  sharing  cause  high  variance  – Wang  et  al.,  INFOCOM’10  

•  Does  the  type  of  sharing  maOer?  Why?  •  Controlled  experiments  on  testbed:  Xen  4.1  

– Five  VMs  share  two  CPU  cores:  40%  each  

•  Vary  workload  combinaHon  – CPU-­‐bound  workload:  ~85%  busy  –  I/O-­‐bound  workload:  same  RPC  framework  

12  

Page 13: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

5  I/O-­‐bound  VMs  on  2  CPU  cores  

13  

Page 14: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

4  I/O-­‐bound  VMs  on  2  CPU  cores  

14  

Page 15: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

3  I/O-­‐bound  VMs  on  2  CPU  cores  

15  

Page 16: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

2  I/O-­‐bound  VMs  on  2  CPU  cores  

16  

Page 17: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

1  I/O-­‐bound  VMs  on  2  CPU  cores  

17  

Page 18: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

The  type  of  sharing  maOers  

•  IncompaHble  sharing  causes  the  long  tail  – CPU-­‐bound  VMs  vs.  I/O-­‐bound  VMs  

•  Sharing  is  not  always  bad  – Only  when  #  of  CPU-­‐bound  VMs  >=  #  of  CPU  cores  

•  Sharing  with  I/O-­‐bound  VMs  is  okay  – Even  when  there  is  one  CPU-­‐bound  VM  mixed  

18  

Page 19: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

Why  the  type  of  sharing  maOers  

•  What  causes  large  latency  –  CPU-­‐bound  VMs  occupy  CPUs  when  requests  arrive  –  Interrupt  processing  for  I/O-­‐bound  VMs  is  delayed  

•  Why  30ms  for  the  99.9th  percenHle  – Default  Hme  slice  –  Could  be  reduced  at  the  cost  of  CPU  throughput  

•  Why  the  tail  only  depends  on  servers  –  Servers  may  need  scheduling  when  a  request  comes  

19  

Page 20: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

Agenda  

•  EC2  has  long  tail  latency  problem  – Much  worse  than  that  in  dedicated  data  centers  – A  problem  of  the  node  not  the  network  

•  Root  cause:  incompaHble  sharing  – Co-­‐scheduling  of  I/O-­‐bound  and  CPU-­‐bound  VMs  

•  SoluHon:  Bobtail  – Pick  VMs  without  long  tails  – A  customer-­‐centric  approach  

20  

Page 21: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

Deal  with  it  

•  Key  insight  – CPU-­‐bound  VMs  delay  interrupt  processing  – Any  interrupt  suffers;  not  just  network  I/O  – Does  tesHng  locally  using  Hme  interrupt  

•  Bobtail:  how  it  works  – Launch  N*K  instances  if  N  needed  –  one  Hme  cost  – Sleep  0.1ms  to  check  if  overslept  (e.g.,  >10ms)  – Pick  the  ones  that  rarely  oversleep  –  kill  others  

21  

Page 22: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

EvaluaHon  

•  Bobtail  in  EC2  –  Using  two  availability  zones  in  the  east  region  

•  Use  40  small  instances  –  Bobtail  picks  40  out  of  160  candidates  –  Baseline:  40  launched  directly  by  EC2  

•  ParHHon-­‐AggregaHon  model  [Zats  et  al.,  SIGCOMM’12]  –  Call  all  severs  in  parallel  and  wait  for  the  last  to  respond  –  The  slowest  determines  the  transacHon  response  Hme  

•  Other  benchmarks  have  beOer  results  –  Refer  to  the  paper  

22  

Page 23: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

ParHHon-­‐AggregaHon  Workflows  

10  servers  

20  servers  

40  servers  

23  

Page 24: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

ParHHon-­‐AggregaHon  Workflows  

10  servers  

20  servers  

40  servers  

24  

Page 25: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

Discussion  

•  What  if  everyone  uses  Bobtail  – Not  exclusively  looking  for  empty  space  – Seek  to  share  with  compaHble  workloads  – Emerging  parHHons  –  good  for  everyone    

•  Amazon  could  fix  the  problem  – Reduce  VM  sharing  –  lower  VM  density  – Sacrifice  CPU  efficiency  –  inherent  trade-­‐off  – Smarter  scheduling  –  do  you  trust  users  

25  

Page 26: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

Conclusion  

•  Long  tail  latency  problems  EC2  – Large-­‐scale  measurement  in  real  cloud  – Controlled  experiments  on  local  testbed  

•  Root  cause  of  the  long  tail  problem  – Co-­‐scheduling  of  incompaHble  workloads  – CPU-­‐bound  vs.  I/O-­‐bound    

•  Bobtail:  avoid  VMs  with  long  tails  – A  scalable  approach  – A  user-­‐centric  approach  without  providers’  help  

26  

Page 27: Bobtail:)Avoiding)Long)Tails)in) the)Cloud)) · Use of the Boeing logo and name The Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must

Acknowledgements  

And  of  course  our  research  sponsors  and  collaborators:  

Boeing Corporate Identity Program New: July 22, 2008

Corporate Identity Style Guide | Document template 5

Boeing trademarks overview

A trademark is a word, name, symbol, logo, slogan, shape, or sound that identifies and distinguishes a company’s products or services from those of other parties. Our company’s marks are a promise. They symbolize in the minds of consumers the goodwill that Boeing has built over the decades and reflect the quality of our products and services.

Treating our trademarks with careTo maintain rights in our trademarks, we must use them correctly and consistently. This defends Boeing trademarks from misuse.

Use of the Boeing logo and nameThe Boeing logo and the name Boeing that is used in the Team Boeing wordmark are trademarks that must be treated with care. This document provides guidelines for the use of the Boeing logo in graphics, in conjunction with the FIRST logo, or in co-displays with other team sponsors.

Boeing logo

Team Boeing wordmark

hOp://nsrg.eecs.umich.edu  NSRG

The  Students  and  Collaborators  from  the  Networking  and  Security  Research  Group: