Download - YARN

Transcript
Page 1: YARN

1  

YARN  Alex  Moundalexis    

@technmsg  

Page 2: YARN

CC  BY  2.0  /  Richard  Bumgardner  

Been  there,  done  that.  

Page 3: YARN

3  

• SoluAons  Architect  •  AKA  consultant  •  government  •  Infrastructure  

 

Alex  @  Cloudera  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 4: YARN

4  

• product  •  distribuAon  of  Hadoop  components,  Apache  licensed  •  enterprise  tooling  

• support  • training  • services  (aka  consulAng)  • community  

What  Does  Cloudera  Do?  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 5: YARN

5  

• Cloudera  builds  things  soTware  • most  donated  to  Apache  •  some  closed-­‐source  

• Cloudera  “products”  I  reference  are  open  source  •  Apache  Licensed  •  source  code  is  on  GitHub  

• h[ps://github.com/cloudera  

Disclaimer  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 6: YARN

6  

• deploying  •  Puppet,  Chef,  Ansible,  homegrown  scripts,  intern  labor  

• sizing  &  tuning  •  depends  heavily  on  data  and  workload  

• coding  •  line  diagrams  don’t  count  

• algorithms  •  I  suck  at  math,  ask  anyone  

What  This  Talk  Isn’t  About  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 7: YARN

7  

• Why  YARN?  • Architecture  • Availability  • Resources  &  Scheduling  • MR1  to  MR2  Gotchas  • History  •  Interfaces  • ApplicaAons  • StoryAme  

So  What  ARE  We  Talking  About?    

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 8: YARN
Page 9: YARN

9  

Why  “Ecosystem?”  

•  In  the  beginning,  just  Hadoop  • HDFS  • MapReduce  

• Today,  dozens  of  interrelated  components  •  I/O  •  Processing  •  Specialty  ApplicaAons  •  ConfiguraAon  • Workflow  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 10: YARN

10  

ParAal  Ecosystem  

Hadoop  

external  system  

RDBMS  /  DWH  

web  server  

device  logs  

API  access  

log  collecAon  

DB  table  import  

batch  processing  

machine  learning  

external  system  

API  access  

user  

RDBMS  /  DWH  

DB  table    export  

BI  tool  +  JDBC/ODBC  

Search  

SQL  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 11: YARN

11  

HDFS  

• Distributed,  highly  fault-­‐tolerant  filesystem  • OpAmized  for  large  streaming  access  to  data  • Based  on  Google  File  System  

•  h[p://research.google.com/archive/gfs.html  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 12: YARN

12  

Lots  of  Commodity  Machines  

Image:Yahoo! Hadoop cluster [ OSCON ’07 ] Image:Yahoo! Hadoop cluster [ OSCON ’07 ]

Image:Yahoo! Hadoop cluster [ OSCON ’07 ] Image:Yahoo! Hadoop cluster [ OSCON ’07 ]

Image:Yahoo! Hadoop cluster [ OSCON ’07 ] Image:Yahoo! Hadoop cluster [ OSCON ’07 ]

Image:Yahoo! Hadoop cluster [ OSCON ’07 ]

Image:Yahoo! Hadoop cluster [ OSCON ’07 ]

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 13: YARN

13  

MapReduce  (MR)  

• Programming  paradigm  • Batch  oriented,  not  realAme  • Works  well  with  distributed  compuAng  • Lots  of  Java,  but  other  languages  supported  • Based  on  Google’s  paper  

•  h[p://research.google.com/archive/mapreduce.html  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 14: YARN

14  

MR1  Components  

•  JobTracker  •  accepts  jobs  from  client  •  schedules  jobs  on  parAcular  nodes  •  accepts  status  data  from  TaskTrackers    

• TaskTracker  •  one  per-­‐node  • manages  tasks  •  crunches  data  in-­‐place  •  reports  to  JobTracker  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 15: YARN

15  

Under  the  Covers  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 16: YARN

16  

You specify map() and reduce() functions. ���

���The framework does the

rest. 60

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 17: YARN

WHY  DO  WE  NEED  THIS?  But  wait…  

Page 18: YARN

18  18  

Page 19: YARN
Page 20: YARN

20  

YARN  Yet  Another  Ridiculous  Name  

Page 21: YARN

21  

YARN  Yet  Another  Ridiculous  Name  

Page 22: YARN

22  

YARN  Yet  Another  Resource  NegoAator  

Page 23: YARN

23  

Why  YARN  /  MR2?  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• Scalability  • JT  kept  track  of  individual  tasks  and  wouldn’t  scale  

• UAlizaAon  • All  slots  are  equal  even  if  the  work  is  not  equal    

• MulA-­‐tenancy  • Every  framework  shouldn’t  need  to  write  its  own  execuAon  engine  

• All  frameworks  should  share  the  resources  on  a  cluster  

Page 24: YARN

24  

An  OperaAng  System?  

©2014  Cloudera,  Inc.  All  rights  reserved.  

TradiAonal  OperaAng  System  

Storage:  File  System  

ExecuAon/Scheduling:  Processes/Kernel  

Scheduler  

Hadoop  

Storage:  Hadoop  

Distributed  File  System  (HDFS)  

ExecuAon/Scheduling:  Yet  Another  Resource  NegoJaJor  (YARN)  

Page 25: YARN

25  

MulAple  levels  of  scheduling  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• YARN  • Which  applicaAon  (framework)  to  give  resources  to?  

 • ApplicaAon  (Framework  -­‐  MR  etc.)  

• Which  task  within  the  applicaAon  should  use  these  resources?  

Page 26: YARN
Page 27: YARN

27  

Architecture  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 28: YARN

28  

Architecture  –  running  mulAple  applicaAons  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 29: YARN

29  

Control  Flow:  Submit  applicaAon  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 30: YARN

30  

Control  Flow:  Get  applicaAon  updates  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 31: YARN

31  

Control  Flow:  AM  asking  for  resources  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 32: YARN

32  

Control  Flow:  AM  using  containers  

©2014  Cloudera,  Inc.  All  rights  reserved.  

Page 33: YARN

33  

ExecuAon  Modes  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• Local  mode  • Uber  mode  • Executors  

• DefaultContainerExecutor  • LinuxContainerExecutor  

 

Page 34: YARN
Page 35: YARN

35  

Client  Failover  

Client  Failover  

Availability  

©2014  Cloudera,  Inc.  All  rights  reserved.  

RM    Elector  

RM    Elector  ZK  Store  

NM   NM   NM   NM  

Client   Client   Client  

Page 36: YARN

36  

Availability  –  SubtleAes  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• Embedded  leader  elector  • No  need  for  a  separate  daemon  like  ZKFC  

 • Implicit  fencing  using  ZKRMStateStore  

• AcAve  RM  claims  exclusive  access  to  store  through  ACL  magic  

Page 37: YARN

37  

Availability  –  ImplicaAons  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• Previously  submi[ed  applicaAons  conAnue  to  run  • New  ApplicaAon  Masters  are  created    

• If  the  AM  checkpoints  state,  can  conAnue  from  where  it  leT  • MR  keeps  track  of  completed  tasks.  They  don’t  have  to  be  re-­‐run  

• Future  • Work-­‐preserving  RM  Restart  /  Failover  

Page 38: YARN

38  

Availability  –  ImplicaAons  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• Transparent  to  clients  • RM  unavailable  for  a  small  duraAon  • AutomaAcally  failover  to  the  AcAve  RM  • Web  UI  redirects  • REST  API  redirects  (starAng  5.1.0)  

Page 39: YARN
Page 40: YARN

40  

Resource  Model  and  CapaciAes  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• Resource  vectors  • e.g.  1024  MB,  2  vcores,  …  • No  more  task  slots!  

• Nodes  specify  the  amount  of  resources  they  have  • yarn.nodemanager.resource.memory-­‐mb  • yarn.nodemanager.resource.cpu-­‐vcores  

• vcores  to  cores  relaAon,  not  really  “virtual”  

 

Page 41: YARN

41  

Resources  and  Scheduling  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• What  you  request  is  what  you  get  • No  more  fixed-­‐size  slots  • Framework/applicaAon  requests  resources  for  a  task  

• MR  AM  requests  resources  for  map  and  reduce  tasks,  these  requests  can  potenAally  be  for  different  amounts  of  resources  

Page 42: YARN

42  

YARN  Scheduling  

ResourceManager  

ApplicaAon  Master  1  

ApplicaAon  Master  2  

Node  1   Node  2   Node  3  

Page 43: YARN

43  

YARN  Scheduling  

ResourceManager  

ApplicaAon  Master  1  

ApplicaAon  Master  2  

Node  1   Node  2   Node  3  

I  want  2  containers  with  1024  MB  and  a  1  core  each  

Page 44: YARN

44  

YARN  Scheduling  

ResourceManager  

ApplicaAon  Master  1  

ApplicaAon  Master  2  

Node  1   Node  2   Node  3  

Noted  

Page 45: YARN

45  

YARN  Scheduling  

ResourceManager  

ApplicaAon  Master  1  

ApplicaAon  Master  2  

Node  1   Node  2   Node  3  

I’m  sAll  here  

Page 46: YARN

46  

YARN  Scheduling  

ResourceManager  

ApplicaAon  Master  1  

ApplicaAon  Master  2  

Node  1   Node  2   Node  3  

I’ll  reserve  some  space  on  node1  for  AM1  

Page 47: YARN

47  

YARN  Scheduling  

ResourceManager  

ApplicaAon  Master  1  

ApplicaAon  Master  2  

Node  1   Node  2   Node  3  

Got  anything  for  me?  

Page 48: YARN

48  

YARN  Scheduling  

ResourceManager  

ApplicaAon  Master  1  

ApplicaAon  Master  2  

Node  1   Node  2   Node  3  

Here’s  a  security  token  to  let  you  launch  a  container  on  Node  1  

Page 49: YARN

49  

YARN  Scheduling  

ResourceManager  

ApplicaAon  Master  1  

ApplicaAon  Master  2  

Node  1   Node  2   Node  3  

Hey,  launch  my  container  with  this  shell  command  

Page 50: YARN

50  

YARN  Scheduling  

ResourceManager  

ApplicaAon  Master  1  

ApplicaAon  Master  2  

Node  1  Node  2   Node  3  

Container  

Page 51: YARN

51  

Resources  on  a  Node  5  GB  

Reduce  1536  MB  

Map  512  MB  

Map  1024  MB  

Map  256  MB  

Map  256  MB  

Reduce  512  MB  

MR  -­‐  AM  1024  MB  

Page 52: YARN

52  

FairScheduler  (FS)  

• When  space  becomes  available  to  run  a  task  on  the  cluster,  which  applicaAon  do  we  give  it  to?  

• Find  the  job  that  is  using  the  least  space.  

Page 53: YARN

53  

FS:  Apps  and  Queues  

• Apps  go  in  “queues”  • Share  fairly  between  

queues  • Share  fairly  between  

apps  within  queues  

Page 54: YARN

54  

FS: Hierarchical Queues

Root  Mem  Capacity:  12  GB  CPU  Capacity:  24  cores  

MarkeJng  Fair  Share  Mem:  4  GB  Fair  Share  CPU:  8  

cores  

R&D  Fair  Share  Mem:  4  GB  Fair  Share  CPU:  8  

cores  

Sales  Fair  Share  Mem:  4  GB  Fair  Share  CPU:  8  

cores  

Jim’s  Team  Fair  Share  Mem:  2  GB  Fair  Share  CPU:  4  

cores  

Bob’s  Team  Fair  Share  Mem:  2  GB  Fair  Share  CPU:  4  

cores  

Page 55: YARN

55  

FS: Fast and Slow Lanes

Root  Mem  Capacity:  12  GB  CPU  Capacity:  24  cores  

MarkeJng  Fair  Share  Mem:  4  GB  Fair  Share  CPU:  8  cores  

Sales  Fair  Share  Mem:  4  GB  Fair  Share  CPU:  8  cores  

Fast  Lane  Max  Share  Mem:  1  GB  Max  Share  CPU:  1  cores  

Slow  Lane  Fair  Share  Mem:  3  GB  Fair  Share  CPU:  7  cores  

Page 56: YARN

56  

• Traverse  the  tree  starAng  at  the  root  queue  • Offer  resources  to  subqueues  in  order  of  how  few  resources  they’re  using  

FS:  Fairness  for  Hierarchies  

Page 57: YARN

57  

FS: Hierarchical Queues

Root      

MarkeJng      

R&D      

Sales      

Jim’s  Team      

Bob’s  Team      

Page 58: YARN

58  

FS:  MulA-­‐resource  scheduling  

• Scheduling  based  on  mulAple  resources  • CPU,  memory  • Future:  Disk,  Network  

• Why  mulAple  resources?  • Be[er  uAlizaAon  • More  fair  

Page 59: YARN

59  

FS:  More  features  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• PreempAon  • To  avoid  starvaAon,  preempt  tasks  using  more  than  their  fairshare  aTer  the  preempAon  Ameout  

• Warn  applicaAons.  ApplicaAon  can  choose  to  kill  any  of  its  containers  

• Locality  through  delay  scheduling  • Try  to  give  node-­‐local,  rack-­‐local  resources  by  waiAng  for  someAme  

 

Page 60: YARN

60  

Enforcing  resource  limits  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• Memory  • Monitor  process  usage  and  kill  if  crosses  • Disable  virtual  memory  checking  • Physical  memory  checking  is  being  improved  

• CPU  • Cgroups  

Page 61: YARN

61  61  MicrosoT  Office  EULA.  Really.  

Page 62: YARN

62  

MR1  to  MR2  Gotchas  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• AMs  can  take  up  all  resources  • Symptom:  Submi[ed  jobs  don’t  run  • Fix  in  progress  -­‐  to  limit  number  of  max  applicaAons  • Work  around  –  scheduler  allocaAons  to  limit  number  of  applicaAons  

• How  to  run  4  maps  and  2  reduces  per  node?  • Don’t  try  to  tune  number  of  tasks  per  node  • Set  assignMulAple  to  false  to  spread  allocaAons      

Page 63: YARN

63  

MR1  to  MR2  Gotchas  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• Comparing  MR1  and  MR2  benchmarks  • TestDFSIO  runs  best  on  dedicated  CPU/disk,  harder  to  pin  • TeraSort  changed:  less  compressible  ==  more  network  xfer  

• Resource  AllocaAon  vs  Resource  ConsumpAon  • RM  allocates  resources,  heap  specified  elsewhere  • JVM  overhead  not  included    • Mind  your  mapred.[map|reduce].child.java.opts  

Page 64: YARN

64  

MR1  to  MR2  Gotchas  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• Changes  in  logs,  tracing  problems  harder  • MR1:  distributed  grep  on  JobId  • YARN  logs  more  generic,  deal  with  containers  not  apps  

Page 65: YARN
Page 66: YARN

66  

Job  History  

• Job  History  Viewing  was  moved  to  its  own  server:  Job  History  Server  

• Helps  with  load  on  RM  (JT  equivalent)  • Helps  separate  MR  from  YARN  

 

Page 67: YARN

67  

How  History  Flows?  

• AM    • While  running,  keeps  track  of  all  events  during  execuAon  

• On  success,  before  finishing  up  • Writes  the  history  informaAon  to  done_intermediate_dir  

• The  JHS    • periodically  scans  the  done_intermediate  dir    • moves  the  files  to  done_dir  • starts  showing  the  history  

Page 68: YARN

68  

History:  Important  ConfiguraAon  ProperAes  

• yarn.app.mapreduce.am.staging-dir •  Default  (CM):  /user      ←  Want  this  also  for  security  •  Default  (CDH):  /tmp/hadoop-­‐yarn/staging  •  Staging  directory  for  MapReduce  applicaAons  

• mapreduce.jobhistory.done-dir

•  Default:  ${yarn.app.mapreduce.am.staging-­‐dir}/history/done  •  Final  locaAon  in  HDFS  for  history  files  

 • mapreduce.jobhistory.intermediate-done-dir

•  Default:  ${yarn.app.mapreduce.am.staging-­‐dir}/history/done_intermediate  •  LocaAon  in  HDFS  where  AMs  dump  history  files  

 

Page 69: YARN

69  

History:  Important  ConfiguraAon  ProperAes  

• mapreduce.jobhistory.max-age-ms • Default  604800000  (7  days)  • Max  age  before  JHS  deletes  history  

 • mapreduce.jobhistory.move.interval-ms

• Default:  180000  (3  min)  • Frequency  at  which  JHS  scans  the  intermediate_done  dir  

Page 70: YARN

70  

History:  Miscellaneous  

• The  JHS  runs  as  ‘mapred’,  the  AM  run  as  the  user  who  submi[ed  the  job,  and  the  RM  runs  as  ‘yarn’    • The  done-­‐intermediate  dir  needs  to  be  writable  by  the  user  who  submi[ed  the  job  and  readable  by  ‘mapred’    • The  RM,  AM,  and  JHS  should  have  idenAcal  versions  of  the  jobhistory-­‐related  properAes  so  they  all  “agree”  

Page 71: YARN

71  

ApplicaAon  History  Server  /  Timeline  Server  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• Work  in  progress  to  capture  history  and  other  informaAon  for  non-­‐MR  YARN  applicaAons    

Page 72: YARN

72  

YARN  Container  Logs  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• While  applicaAon  is  running  • Local  to  the  NM.  yarn.nodemanager.log-­‐dirs  

• ATer  applicaAon  finishes  • Logs  aggregated  to  HDFS  

• yarn.nodemanager.remote-­‐app-­‐log-­‐dir  

• Disable  aggregaAon?  • yarn.log-­‐aggregaAon-­‐enable  

   

Page 73: YARN
Page 74: YARN
Page 75: YARN

75  

InteracAng  with  a  YARN  cluster  

©2014  Cloudera,  Inc.  All  rights  reserved.  

• Java  API  • MR1  –  MR2  APIs  are  compaAble  

• REST  API  • RM,  NM,  JHS  –  all  have  REST  APIs  that  are  very  useful  

• Llama  (Long-­‐Lived  ApplicaAon  Master)  • Cloudera  Impala  can  reserve,  use,  and  release  resource  allocaAons  without  using  YARN-­‐managed  container  processes  

• CLI  • yarn  rmadmin,  applicaAon,  etc.  

• Web  UI  • New  and  “improved”  –  need  Ame  to  get  used  to  

Page 76: YARN
Page 77: YARN

77   ©2014  Cloudera,  Inc.  All  rights  reserved.  

• MR2  • Cloudera  Impala  • Apache  Spark  • Others?  Custom?    • Apache  Slider  (incubaAng);  not  producAon-­‐ready  

•  Accumulo  • HBase  •  Storm  

YARN  ApplicaAons  

Page 78: YARN
Page 79: YARN

79   ©2014  Cloudera,  Inc.  All  rights  reserved.  

• Shipping  •  Enabled  by  default  on  CDH5+  •  Included  for  past  two  years,  not  enabled  

• Supported  • Recommended  

The  Cloudera  View  of  YARN  

Page 80: YARN

80   ©2014  Cloudera,  Inc.  All  rights  reserved.  

• Benchmarking  is  harder  •  different  uAlizaAon  paradigm  •  “whole  cluster”  benchmarks  more  important,  e.g.  SWIM  

• Tuning  sAll  largely  trial/error  • MR1  was  the  same  originally  •  YARN/MR2  will  get  there  eventually  

Growing  Pains  

Page 81: YARN

81   ©2014  Cloudera,  Inc.  All  rights  reserved.  

• A  few  are  using  in  producAon  • Many  are  exploring  

•  Spark  •  Impala  via  Llama  

• Most  are  waiAng  

What  Are  Customers  Doing?  

Page 82: YARN

82   ©2014  Cloudera,  Inc.  All  rights  reserved.  

• Mesos  •  designed  to  be  completely  general  purpose  • more  burden  on  app  developer  (offer  model  vs  app  request)  

• YARN  •  designed  with  Hadoop  in  mind  •  supports  Kerberos  • more  robust/familiar  scheduling  •  rack/machine  locality,  out  of  box  

• Supportability  •  all  commercial  Hadoop  vendors  support  YARN  •  support  for  Mesos  limited  to  startup  Mesosphere  

Why  not  Mesos?  

Page 83: YARN

83   ©2014  Cloudera,  Inc.  All  rights  reserved.  

Is  This  the  End  for  MapReduce?  

Page 84: YARN

ALL  OF  YOU  Extra  special  thanks:  

Page 85: YARN

85   ©2014  Cloudera,  Inc.  All  rights  reserved.  

• CC  BY  2.0  flik            h[ps://flic.kr/p/4RVoUX  • CC  BY  2.0  Ian  Sane            h[ps://flic.kr/p/nRyHxd  • CC  BY-­‐NC  2.0  lollyknit        h[ps://flic.kr/p/49C1Xi  • CC  BY-­‐ND  2.0  jankunst        h[ps://flic.kr/p/deU71s  • CC  BY-­‐SA  2.0  pierrepocs      h[ps://flic.kr/p/9mgdMd  • CC  BY-­‐SA  2.0  bekathwia      h[ps://flic.kr/p/4FpABU  • CC  BY-­‐NC-­‐ND  2.0  digitalnc      h[ps://flic.kr/p/dxyTt1  • CC  BY-­‐NC-­‐ND  2.0  arselectronica  h[ps://flic.kr/p/7yw8z2  • CC  BY-­‐NC-­‐ND  2.0  yum9me    h[ps://flic.kr/p/81hQ49  • CC  BY-­‐NC-­‐SA  2.0  jimnix      h[ps://flic.kr/p/gsqpWC  • MicrosoT  Office  EULA  (really)  

Image  Credits  

Page 86: YARN

86  

Thank  You!  Alex  Moundalexis  @technmsg    Insert  wi[y  tagline  here.  


Top Related