ensuring qos in multi-tenant hadoop environments

19
Ensuring QoS in Multitenant Hadoop Environments Eliminate contention and guarantee SLAs Sean Suchter CEO & Cofounder, Pepperdata

Upload: becky-mendenhall

Post on 12-Apr-2017

80 views

Category:

Technology


1 download

TRANSCRIPT

Ensuring  QoS in  Multi-­‐tenant  Hadoop  EnvironmentsEliminate  contention  and  guarantee  SLAs

Sean  SuchterCEO  &  Co-­‐founder,  Pepperdata

©2016  Pepperdata

4  QoS use  cases

1. Queue  vs.  queue  (scheduler  QoS)

2. HBase  vs.  ad  hoc

3. ETL  vs.  ad  hoc  (MapReduce  vs.  MapReduce)

4. Spikes  in  swapping

©2016  Pepperdata

Situation  1:  Queue  vs.  Queue

©2016  Pepperdata

1.  Queue  vs.  queue

Preemption  is  the  current  fix  for  queue  vs.  queue  (i.e.  scheduler)  contention.  

Without  preemption

With  preemption

15  mins

15  mins

45  mins

45  mins

75  mins

75  mins

APP  1  (Q1) APP  2  (Q2)

APP  1  (Q1) APP  2  (Q2)

#  containers  guaranteed  for  Q2

#  containers  guaranteed  for  Q1

#  containers  guaranteed  for  Q2

#  containers  guaranteed  for  Q1

©2016  Pepperdata

Preemption  balances  container/task  counts,  but…

• Preemption  requires  killing  containers  to  start  new  ones• Wastes  work

• Preemption  only  balances  containers  among  queues• Does  not  balance  disk  I/O,  network,  etc.

©2016  Pepperdata

Preemption  balances  containers,  not  all  hardware  use

Different  disk  usage

Even  split  of  containers  between  jobs

Different  network  usage

Different  CPU  usage

©2016  Pepperdata

Situation  2:  HBase  vs.  Ad  Hoc

©2016  Pepperdata

2.  HBase  vs.  ad  hoc  job  queries

Device

HBase

Analyst

MapReduce

Sub-­‐second  SLA

Hadoop

HDFS

Ad  hoc

Device

Device

Device

Device

Device

Device

Device

Business  critical

Analyst

AnalystNo  SLA…

(Millions   of  devices)

The  devices  writing  in  need  a  lot  of  disk  bandwidth,  but  expensive  ad  hoc  queries  come  in  and  use  up  all  that  resource  (bandwidth).

©2016  Pepperdata

What  might  you  try….

Scale  out  HBase Use  HBase online  snapshots

Have  separate  HBase cluster

Physically  separate  HBase and  MapReduce

©2016  Pepperdata

With  Pepperdata,  running  jobs  are  actively  managed  in  real  time  and  can  be  prioritized,  even  with  simultaneous  workloads.

Low-­‐priority  batch  jobs

High-­‐priority  HBase  workload

BEFORE AFTER

Problem  solved:  important  jobs  now  get  priority

©2016  Pepperdata

Situation  3:  ETL  vs.  Ad  Hoc

©2016  Pepperdata

3.  ETL  vs.  ad  hoc  (aka  MapReduce  vs.  MapReduce)

• Customer  example:• Online  provider  of  real  estate  data

• Hundred  nightly  jobs  that  have  to  run,  with  hard  and  fast  SLAs

• Pulling  from  many  data  sources  (batch  and  streaming)  and  in  real time

• Need  to  ensure  SLAs  for  business  critical  jobs

©2016  Pepperdata

Problem  solved:  ETL  variance  eliminated

Run  1 Run  2 Run  3Without  Pepperdata 194 230 308With  Pepperdata 203 214 258

©2016  Pepperdata

Situation  4:  Spikes  in  Swapping

©2016  Pepperdata

4.  Swapping  gets  out  of  control

Process  P1

Process  P2

Disk

Operating  system

User  space

Mainmemory

swap  out

swap  in

When  memory  is  no  longer  sufficient,  swapping  occurs.  The  roll  out  (to  disk)  and  roll  in  (to  memory)  drastically  increases  time  for  context  switch.

©2016  Pepperdata

Problem  solved:  Extreme  swapping  is  eliminated

1.  Change  of  job  mix  led  to  heavier  memory  usage  and  swapping

2.  Pepperdata  detected  excessive  swapping

3.  Pepperdata  told  scheduler  to  slow  down  container  creation

4.  Extreme  swapping  eliminated

©2016  Pepperdata

Summary

• QoS is  imperative  to  guaranteeing  SLAs  in  multi-­‐tenant,  multi-­‐workload  Hadoop  environments.

• All  size  Hadoop  deployments  need  QoS – not  just  for  huge  clusters!

• Traditional  solutions  (i.e.  preemption)  do  not  solve  real-­‐time  contention  problems  or  monitor  actual  hardware  utilization.

• To  guarantee  QoS for  Hadoop,  you  need  a  real-­‐time,  dynamic  solution  that  actively  reshapes  cluster  activity.

©2016  Pepperdata

THANK  YOU

©2016  Pepperdata

Appendix