improving hadoop performance via linux

54
Improving Hadoop Cluster Performance via Linux Configura:on 2014 Hadoop Summit – San Jose, California Alex Moundalexis alexm at clouderagovt.com @technmsg

Upload: alex-moundalexis

Post on 07-Nov-2014

1.920 views

Category:

Technology


10 download

DESCRIPTION

Administering a Hadoop cluster isn't easy. Many Hadoop clusters suffer from Linux configuration problems that can negatively impact performance. With vast and sometimes confusing config/tuning options, it can can tempting (and scary) for a cluster administrator to make changes to Hadoop when cluster performance isn't as expected. Learn how to improve Hadoop cluster performance and eliminate common problem areas, applicable across use cases, using a handful of simple Linux configuration changes.

TRANSCRIPT

Page 1: Improving Hadoop Performance via Linux

Improving  Hadoop  Cluster  Performance  via  Linux  Configura:on  2014  Hadoop  Summit  –  San  Jose,  California    Alex  Moundalexis  alexm  at  clouderagovt.com    @technmsg  

Page 2: Improving Hadoop Performance via Linux

2

Tips  from  a  Former  SA  

Page 3: Improving Hadoop Performance via Linux

Click  to  edit  Master  :tle  style  

CC  BY  2.0  /  Richard  Bumgardner  

Been  there,  done  that.  

Page 4: Improving Hadoop Performance via Linux

4

Tips  from  a  Former  SA  Field  Guy  

Page 5: Improving Hadoop Performance via Linux

Click  to  edit  Master  :tle  style  

CC  BY  2.0  /  Alex  Moundalexis  

Home  sweet  home.  

Page 6: Improving Hadoop Performance via Linux

6

Tips  from  a  Former  SA  Field  Guy  Easy  steps  to  take…    

Page 7: Improving Hadoop Performance via Linux

7

Tips  from  a  Former  SA  Field  Guy  Easy  steps  to  take…  that  most  people  don’t.  

Page 8: Improving Hadoop Performance via Linux

What  This  Talk  Isn’t  About  

•  Deploying  •  Puppet,  Chef,  Ansible,  homegrown  scripts,  intern  labor  

•  Sizing  &  Tuning  •  Depends  heavily  on  data  and  workload  

•  Coding  •  Unless  you  count  STDOUT  redirec:on  

•  Algorithms  •  I  suck  at  math,  but  we’ll  try  some  mul:plica:on  later  

8

Page 9: Improving Hadoop Performance via Linux

9  

“  The  answer  to  most  Hadoop  ques:ons  is  it  

depends.”  

Page 10: Improving Hadoop Performance via Linux

So  What  ARE  We  Talking  About?  

•  Seven  simple  things  •  Quick  •  Safe  •  Viable  for  most  environments  and  use  cases  

•  Iden:fy  issue,  then  offer  solu:on  

•  Note:  Commands  run  as  root  or  sudo  

10

Page 11: Improving Hadoop Performance via Linux

11

Bad  news,  best  not  to…  

1.  Swapping  

Page 12: Improving Hadoop Performance via Linux

Swapping  

•  A  form  of  memory  management  •  When  OS  runs  low  on  memory…  

•  write  blocks  to  disk  •  use  now-­‐free  memory  for  other  things  •  read  blocks  back  into  memory  from  disk  when  needed  

•  Also  known  as  paging  

12

Page 13: Improving Hadoop Performance via Linux

Swapping  

•  Problem:  Disks  are  slow,  especially  to  seek  •  Hadoop  is  about  maximizing  IO  

•  spend  less  :me  acquiring  data  •  operate  on  data  in  place  •  large  streaming  reads/writes  from  disk  

•  Memory  usage  is  limited  within  JVM  •  we  should  be  able  to  manage  our  memory  

13

Page 14: Improving Hadoop Performance via Linux

Disable  Swap  in  Kernel  

•  Well,  as  much  as  possible.  

•  Immediate:    #  echo  0  >  /proc/sys/vm/swappiness  

•  Persist  ager  reboot:    #  echo  “vm.swappiness  =  0”  >>  /etc/sysctl.conf  

 14

Page 15: Improving Hadoop Performance via Linux

Swapping  Peculiari:es  

•  Behavior  varies  based  on  Linux  kernel  •  CentOS  6.4+  /  Ubuntu  10.10+  •  For  you  kernel  gurus,  that’s  Linux  2.6.32-­‐303+  

•  Prior  •  We  don’t  swap,  except  to  avoid  OOM  condi:on.  

•  Ager  •  We  don’t  swap,  ever.  

•  Details:  hkp://:ny.cloudera.com/noswap  

15

Page 16: Improving Hadoop Performance via Linux

16

Disable  this  too.  

2.  File  Access  Time  

Page 17: Improving Hadoop Performance via Linux

File  Access  Time  

•  Linux  tracks  access  :me  •  writes  to  disk  even  if  all  you  did  was  read  

•  Problem  •  more  disk  seeks  •  HDFS  is  write-­‐once,  read-­‐many  •  NameNode  tracks  access  informa:on  for  HDFS  

17

Page 18: Improving Hadoop Performance via Linux

Don’t  Track  Access  Time  

•  Mount  volumes  with  noatime  op:on  •  In  /etc/fstab:    /dev/sdc  /data01  ext3  defaults,noatime  0    

•  Note:  noatime  assumes  nodirtime  as  well  •  What  about  relatime?  

•  Faster  than  atime  but  slower  than  noatime  •  No  reboot  required  

•  #  mount  -­‐o  remount  /data01  

18

Page 19: Improving Hadoop Performance via Linux

19

Reclaim  it,  impress  your  bosses!  

3.  Root  Reserved  Space  

Page 20: Improving Hadoop Performance via Linux

Root  Reserved  Space  

•  EXT3/4  reserve  5%  of  disk  for  root-­‐owned  files  •  On  an  OS  disk,  sure  •  System  logs,  kernel  panics,  etc  

20

Page 21: Improving Hadoop Performance via Linux

Click  to  edit  Master  :tle  style  

CC  BY  2.0  /  Alex  Moundalexis  

Disks  used  to  be  much  smaller,  right?  

Page 22: Improving Hadoop Performance via Linux

Do  The  Math  

•  Conserva:ve  •  5%  of  1  TB  disk  =  46  GB  •  5  data  disks  per  server  =  230  GB  •  5  servers  per  rack  =  1.15  TB  

•  Quasi-­‐Aggressive  •  5%  of  4  TB  disk  =  186  GB  •  12  data  disks  per  server  =  2.23  TB  •  18  servers  per  rack  =  40.1  TB  

•  That’s  a  LOT  of  unused  storage!  22

Page 23: Improving Hadoop Performance via Linux

Root  Reserved  Space  

•  On  a  Hadoop  data  disk,  no  root-­‐owned  files  

•  When  crea:ng  a  par::on    #  mkfs.ext3  –m  0  /dev/sdc  

•  On  exis:ng  par::ons    #  tune2fs  -­‐m  0  /dev/sdc  

•  0  is  safe,  1  is  for  the  ultra-­‐paranoid  

23

Page 24: Improving Hadoop Performance via Linux

24

Turn  it  on,  already!  

4.  Name  Service  Cache  Daemon  

Page 25: Improving Hadoop Performance via Linux

Name  Service  Cache  Daemon  

•  Daemon  that  caches  name  service  requests  •  Passwords  •  Groups  •  Hosts  

•  Helps  weather  network  hiccups  •  Helps  more  with  high  latency  LDAP,  NIS,  NIS+  •  Small  footprint  •  Zero  configura:on  required  

25

Page 26: Improving Hadoop Performance via Linux

Name  Service  Cache  Daemon  

•  Hadoop  nodes  •  largely  a  network-­‐based  applica:on  •  on  the  network  constantly  •  issue  lots  of  DNS  lookups,  especially  HBase  &  distcp  •  can  thrash  DNS  servers  

•  Reducing  latency  of  service  requests?  Smart.  •  Reducing  impact  on  shared  infrastructure?  Smart.  

26

Page 27: Improving Hadoop Performance via Linux

Name  Service  Cache  Daemon  

•  Turn  it  on,  let  it  work,  leave  it  alone:  #  chkconfig  -­‐-­‐level  345  nscd  on  #  service  nscd  start    

•  Check  on  it  later:  #  nscd  -­‐g  

•  Unless  using  Red  Hat  SSSD;  modify  ncsd  config  first!  •  Don’t  use  nscd  to  cache  passwd,  group,  or  netgroup  •  Red  Hat,  Using  NSCD  with  SSSD.  hkp://goo.gl/68HTMQ  

27

Page 28: Improving Hadoop Performance via Linux

28

Not  a  problem,  un:l  they  are.  

5.  File  Handle  Limits  

Page 29: Improving Hadoop Performance via Linux

File  Handle  Limits  

•  Kernel  refers  to  files  via  a  handle  •  Also  called  descriptors  

•  Linux  is  a  mul:-­‐user  system  •  File  handles  protect  the  system  from  

•  Poor  coding  •  Malicious  users  •  Pictures  of  cats  on  the  Internet  

29

Page 30: Improving Hadoop Performance via Linux

30  Microsog  Office  EULA.  Really.  

java.io.FileNotFoundExcep:on:  (Too  many  open  files)  

Page 31: Improving Hadoop Performance via Linux

File  Handle  Limits  

•  Linux  defaults  usually  not  enough  •  Increase  maximum  open  files  (default  1024)  

#  echo  hdfs  –  nofile  32768  >>  /etc/security/limits.conf  #  echo  mapred  –  nofile  32768  >>  /etc/security/limits.conf  #  echo  hbase  –  nofile  32768  >>  /etc/security/limits.conf  

•  Bonus:  Increase  maximum  processes  too  #  echo  hdfs  –  nproc  32768  >>  /etc/security/limits.conf  #  echo  mapred  –  nproc  32768  >>  /etc/security/limits.conf  #  echo  hbase  –  nproc  32768  >>  /etc/security/limits.conf  

•  Note:  Cloudera  Manager  will  do  this  for  you.  

31

Page 32: Improving Hadoop Performance via Linux

32

Don’t  be  tempted  to  share,  even  on  monster  disks.  

6.  Dedicated  Disk  for  OS  and  Logs  

Page 33: Improving Hadoop Performance via Linux

The  Situa:on  in  Easy  Steps  

1.  Your  new  server  has  a  dozen  1  TB  disks  2.  Eleven  disks  are  used  to  store  data  3.  One  disk  is  used  for  the  OS  

•  20  GB  for  the  OS  •  980  GB  sits  unused    

4.  Someone  asks  “can  we  store  data  there  too?”  5.  Seems  reasonable,  lots  of  space…  “OK,  why  not.”  

Sound  familiar?  

33

Page 34: Improving Hadoop Performance via Linux

34  Microsog  Office  EULA.  Really.  

I  don’t  understand  it,  there’s    no  consistency  to  these  run  >mes!  

Page 35: Improving Hadoop Performance via Linux

No  Love  for  Shared  Disk  

•  Our  quest  for  data  gets  interrupted  a  lot:  •  OS  opera:ons  •  OS  logs  •  Hadoop  logging,  quite  chaky  •  Hadoop  execu:on  •  userspace  execu:on  

•  Disk  seeks  are  slow,  remember?  

35

Page 36: Improving Hadoop Performance via Linux

Dedicated  Disk  for  OS  and  Logs  

•  At  install  :me      •  Disk  0,  OS  &  logs  •  Disk  1-­‐n,  Hadoop  data  

•  Ager  install,  more  complicated  effort,  requires  manual  HDFS  block  rebalancing:  1.  Take  down  HDFS  

•  If  you  can  do  it  in  under  10  minutes,  just  the  DataNode  2.  Move  or  distribute  blocks  from  disk0/dir  to  disk[1-­‐n]/dir  3.  Remove  dir  from  HDFS  config  (dfs.data.dir)  4.  Start  HDFS  

36

Page 37: Improving Hadoop Performance via Linux

37

Sane,  both  forward  and  reverse.  

7.  Name  Resolu:on  

Page 38: Improving Hadoop Performance via Linux

Name  Resolu:on  Op:ons  

1.  Hosts  file,  if  you  must  2.  DNS,  much  preferred    

 

38

Page 39: Improving Hadoop Performance via Linux

Name  Resolu:on  with  Hosts  File  

•  Set  canonical  names  properly    

•  Right    10.1.1.1    r01m01.cluster.org  r01m01  master1    10.1.1.2    r01w01.cluster.org    r01w01  worker1  

•  Wrong    10.1.1.1    r01m01  r01m01.cluster.org  master1    10.1.1.2    r01w01  r01w01.cluster.org  worker1  

39

Page 40: Improving Hadoop Performance via Linux

Name  Resolu:on  with  Hosts  File  

•  Set  loopback  address  properly  •  Ensure  127.0.0.1  resolves  to  localhost,  NOT  hostname  

•  Right    127.0.0.1  localhost  

•  Wrong    127.0.0.1  r01m01  

40

Page 41: Improving Hadoop Performance via Linux

Name  Resolu:on  with  DNS  

•  Forward  •  Reverse  

•  Hostname  should  MATCH  the  FQDN  in  DNS  

41

Page 42: Improving Hadoop Performance via Linux

This  Is  What  You  Ought  to  See  

42

Page 43: Improving Hadoop Performance via Linux

Name  Resolu:on  Errata  

•  Mismatches?  Expect  odd  results.  •  Problems  star:ng  DataNodes  •  Non-­‐FQDN  in  Web  UI  links  •  Security  features  are  extra  sensi:ve  to  FQDN  

•  Errors  so  common  that  link  to  FAQ  is  included  in  logs!  •  hkp://wiki.apache.org/hadoop/UnknownHost  

•  Get  name  resolu:on  working  BEFORE  enabling  nscd!  

43

Page 44: Improving Hadoop Performance via Linux

44

Time  to  take  out  your  camera  phones…  

Summary  

Page 45: Improving Hadoop Performance via Linux

Summary  

1.  disable  vm.swappiness  2.  data  disks:  mount  with  noatime  op:on  3.  data  disks:  disable  root  reserve  space  4.  enable  nscd  5.  increase  file  handle  limits  6.  use  dedicated  OS/logging  disk  7.  sane  name  resolu:on  

hkp://:ny.cloudera.com/7steps  

45

Page 46: Improving Hadoop Performance via Linux

Recommended  Reading  

•  Hadoop  Opera:ons  hkp://amzn.to/1hDaN9B  

46

Page 47: Improving Hadoop Performance via Linux

47

Preferably  related  to  the  talk…  

Ques:ons?  

Page 48: Improving Hadoop Performance via Linux

48

Thank  You!  Alex  Moundalexis  alexm  at  clouderagovt.com  @technmsg    We’re  hiring,  kids!  Well,  not  kids.  

Page 49: Improving Hadoop Performance via Linux

49

Because  we  had  enough  :me…  

8.  Bonus  Round  

Page 50: Improving Hadoop Performance via Linux

Others  Things  to  Check  

•  Disk  IO  •  hdparm  

•  #  hdparm  -­‐Tt  /dev/sdc  •  Looking  for  at  least  70  MB/s  from  7200  RPM  disks  •  Slower  could  indicate  a  failing  drive,  disk  controller,  array,  etc.  

•  dd  •  hkp://romanrm.ru/en/dd-­‐benchmark  

50

Page 51: Improving Hadoop Performance via Linux

Others  Things  to  Check  

•  Disable  Red  Hat  Transparent  Huge  Pages  (RH6+  Only)  •  Can  reduce  elevated  CPU  usage  •  In  rc.local:  

echo  never  >  /sys/kernel/mm/redhat_transparent_hugepage/defrag  echo  never  >  /sys/kernel/mm/redhat_transparent_hugepage/enabled  

•  Reference:  Linux  6  Transparent  Huge  Pages  and  Hadoop  Workloads,  hkp://goo.gl/WSF2qC  

51

Page 52: Improving Hadoop Performance via Linux

Others  Things  to  Check  

•  Enable  Jumbo  Frames  •  Only  if  your  network  infrastructure  supports  it!  •  Can  easily  (and  arguably)  boost  throughput  by  10-­‐20%  

52

Page 53: Improving Hadoop Performance via Linux

Others  Things  to  Check  

•  Enable  Jumbo  Frames  •  Only  if  your  network  infrastructure  supports  it!  •  Can  easily  (and  arguably)  boost  throughput  by  10-­‐20%  

•  Monitor  Everything  •  How  else  will  you  know  what’s  happening?  •  Nagios,  Ganglia,  CM,  Ambari  

53

Page 54: Improving Hadoop Performance via Linux

54

Thank  You!  Alex  Moundalexis  alexm  at  clouderagovt.com  @technmsg    We’re  hiring,  kids!  Well,  not  kids.