improving hadoop cluster performance via linux configuration
TRANSCRIPT
Improving Hadoop Cluster Performance via Linux Configura:on DevIgni:on 2014 – Dulles, Virginia
Alex Moundalexis // @technmsg
2 © Cloudera, Inc. All rights reserved.
Tips from a former system administrator
3 © Cloudera, Inc. All rights reserved.
Click to edit Master :tle style
CC BY 2.0 / Richard Bumgardner
Been there, done that.
4 © Cloudera, Inc. All rights reserved.
Tips from a former system administrator field guy
5 © Cloudera, Inc. All rights reserved.
Click to edit Master :tle style
CC BY 2.0 / Alex Moundalexis
Home sweet home.
6 © Cloudera, Inc. All rights reserved.
Tips Easy steps to take…
7 © Cloudera, Inc. All rights reserved.
Tips Easy steps to take… that most people don’t.
8 © Cloudera, Inc. All rights reserved.
What this talk isn’t about
• Deploying • Puppet, Chef, Ansible, homegrown scripts, intern labor
• Sizing & Tuning • Depends heavily on data and workload
• Coding • Unless you count STDOUT redirec:on
• Algorithms • I suck at math, but we’ll try some mul:plica:on later
9 © Cloudera, Inc. All rights reserved.
“The answer to most Hadoop ques:ons is…
10 © Cloudera, Inc. All rights reserved.
“The answer to most Hadoop ques:ons is… it depends.”
11 © Cloudera, Inc. All rights reserved.
“The answer to most Hadoop ques:ons is… it depends.” (helpful, right?)
12 © Cloudera, Inc. All rights reserved.
So what ARE we talking about?
• Seven simple things • Quick • Safe • Viable for most environments and use cases
• Iden:fy issue, then offer solu:on • Note: Commands run as root or sudo
13 © Cloudera, Inc. All rights reserved.
1. Swapping Bad news, best not to.
14 © Cloudera, Inc. All rights reserved.
Swapping
• A form of memory management • When OS runs low on memory… • write blocks to disk • use now-‐free memory for other things • read blocks back into memory from disk when needed
• Also known as paging
15 © Cloudera, Inc. All rights reserved.
Swapping
• Problem: Disks are slow, especially to seek • Hadoop is about maximizing IO • spend less :me acquiring data • operate on data in place • large streaming reads/writes from disk
• Memory usage is somewhat limited within JVM • we should be able to manage our memory • account for JVM overhead
16 © Cloudera, Inc. All rights reserved.
Limit swapping in kernel
• Well, as much as possible. • Immediate: # echo 1 > /proc/sys/vm/swappiness
• Persist amer reboot: # echo "vm.swappiness = 1" >> /etc/sysctl.conf
17 © Cloudera, Inc. All rights reserved.
Swapping peculiari:es
• Behavior varies based on Linux kernel • CentOS 6.4+ / Ubuntu 10.10+ • For you kernel gurus, that’s Linux 2.6.32-‐303+
• Prior • We don’t swap, except to avoid OOM condi:on.
• Amer • We don’t swap, ever.
• Details: hpp://:ny.cloudera.com/noswap
18 © Cloudera, Inc. All rights reserved.
2. File Access Time Disable this too.
19 © Cloudera, Inc. All rights reserved.
File access :me
• Linux tracks access :me • writes to disk even if all you did was read
• Problem • more disk seeks • HDFS is write-‐once, read-‐many • NameNode tracks access informa:on for HDFS
20 © Cloudera, Inc. All rights reserved.
Don’t track access :me
• Mount volumes with noatime op:on • In /etc/fstab: /dev/sdc /data01 ext3 defaults,noatime 0
• Note: noatime assumes nodirtime as well • What about relatime? • Faster than atime but slower than noatime
• No reboot required • # mount -‐o remount /data01
21 © Cloudera, Inc. All rights reserved.
3. Root Reserved Space Reclaim it, impress your bosses!
22 © Cloudera, Inc. All rights reserved.
Root reserved space
• EXT3/4 reserve 5% of disk for root-‐owned files • On an OS disk, sure • System logs, kernel panics, etc
23 © Cloudera, Inc. All rights reserved.
Click to edit Master :tle style
CC BY 2.0 / Alex Moundalexis
Disks used to be much smaller, right?
24 © Cloudera, Inc. All rights reserved.
Do the math
• Conserva:ve • 5% of 1 TB disk = 46 GB • 5 data disks per server = 230 GB • 5 servers per rack = 1.15 TB
• Quasi-‐Aggressive • 5% of 4 TB disk = 186 GB • 12 data disks per server = 2.23 TB • 18 servers per rack = 40.1 TB
• That’s a LOT of unused storage!
25 © Cloudera, Inc. All rights reserved.
Root reserved space
• On a Hadoop data disk, no root-‐owned files • When crea:ng a par::on # mkfs.ext3 –m 0 /dev/sdc
• On exis:ng par::ons # tune2fs -‐m 0 /dev/sdc • 0 is safe, 1 is for the ultra-‐paranoid
26 © Cloudera, Inc. All rights reserved.
4. Name Service Cache Turn it on, already!
27 © Cloudera, Inc. All rights reserved.
Name Service Cache Daemon
• Daemon that caches name service requests • Passwords • Groups • Hosts
• Helps weather network hiccups • Helps more with high latency LDAP, NIS, NIS+ • Small footprint • Zero configura:on required
28 © Cloudera, Inc. All rights reserved.
Name Service Cache Daemon
• Hadoop nodes • largely a network-‐based applica:on • on the network constantly • issue lots of name lookups, especially HBase & distcp • can thrash name servers
• Reducing latency of service requests? Smart. • Reducing impact on shared infrastructure? Smart.
29 © Cloudera, Inc. All rights reserved.
Name Service Cache Daemon
• Turn it on, let it work, leave it alone: # chkconfig -‐-‐level 345 nscd on # service nscd start
• Check on it later: # nscd -‐g
• Unless using Red Hat SSSD; modify nscd config first! • Don’t use nscd to cache passwd, group, or netgroup • Red Hat, Using NSCD with SSSD. hpp://goo.gl/68HTMQ
30 © Cloudera, Inc. All rights reserved.
5. File Handle Limits Not a problem, un:l they are.
31 © Cloudera, Inc. All rights reserved.
File handle limits
• Kernel refers to files via a handle • Also called descriptors
• Linux is a mul:-‐user system • File handles protect the system from • Poor coding • Malicious users • Poor coding of malicious users • Pictures of cats on the Internet
32 © Cloudera, Inc. All rights reserved. 32 Microsom Office EULA. Really.
java.io.FileNotFoundExcep:on: (Too many open files)
33 © Cloudera, Inc. All rights reserved.
File handle limits
• Linux defaults usually not enough • Increase maximum open files (default 1024)
# echo hdfs – nofile 32768 >> /etc/security/limits.conf # echo mapred – nofile 32768 >> /etc/security/limits.conf # echo hbase – nofile 32768 >> /etc/security/limits.conf
• Bonus: Increase maximum processes too # echo hdfs – nproc 32768 >> /etc/security/limits.conf # echo mapred – nproc 32768 >> /etc/security/limits.conf # echo hbase – nproc 32768 >> /etc/security/limits.conf
• Note: Cloudera Manager will do this for you.
34 © Cloudera, Inc. All rights reserved.
6. Dedicated Disks Don’t be tempted to share, even with monster disks.
35 © Cloudera, Inc. All rights reserved.
The Situa:on
1. Your new server has a dozen 1 TB disks 2. Eleven disks are used to store data 3. One disk is used for the OS • 20 GB for the OS • 980 GB sits unused
4. Someone asks “can we store data there too?” 5. Seems reasonable, lots of space… “OK, why not.”
Sound familiar?
36 © Cloudera, Inc. All rights reserved. Microsom Office EULA. Really.
“I don’t understand it, there’s no consistency to these run >mes!”
37 © Cloudera, Inc. All rights reserved.
No love for shared disk
• Our quest for data gets interrupted a lot: • OS opera:ons • OS logs • Hadoop logging, quite chapy • Hadoop execu:on • userspace execu:on
• Disk seeks are slow, remember?
38 © Cloudera, Inc. All rights reserved.
Dedicated disk for OS and logs
• At install :me • Disk 0, OS & logs • Disk 1-‐n, Hadoop data
• Amer install, more complicated effort, requires manual HDFS block rebalancing: 1. Take down HDFS • If you can do it in under 10 minutes, just the DataNode
2. Move or distribute blocks from disk0/dir to disk[1-‐n]/dir 3. Remove dir from HDFS config (dfs.data.dir) 4. Start HDFS
39 © Cloudera, Inc. All rights reserved.
7. Name Resolu:on Sane, both forward and reverse.
40 © Cloudera, Inc. All rights reserved.
Name resolu:on op:ons
1. Hosts file, if you must 2. DNS, much preferred
41 © Cloudera, Inc. All rights reserved.
Name resolu:on with hosts file
• Set canonical names properly
• Right 10.1.1.1 r01m01.cluster.org r01m01 master1 10.1.1.2 r01w01.cluster.org r01w01 worker1
• Wrong 10.1.1.1 r01m01 r01m01.cluster.org master1 10.1.1.2 r01w01 r01w01.cluster.org worker1
42 © Cloudera, Inc. All rights reserved.
Name resolu:on with hosts file
• Set loopback address properly • Ensure 127.0.0.1 resolves to “localhost,” NOT hostname
• Right 127.0.0.1 localhost
• Wrong 127.0.0.1 r01m01
43 © Cloudera, Inc. All rights reserved.
Name resolu:on with DNS
• Forward • Reverse
• Hostname should match the FQDN in DNS
44 © Cloudera, Inc. All rights reserved.
This is what you ought to see
45 © Cloudera, Inc. All rights reserved.
Name resolu:on errata
• Mismatches? Expect odd results. • Problems star:ng DataNodes • Non-‐FQDN in Web UI links • Security features are extra sensi:ve to FQDN
• Errors so common that link to FAQ is included in logs! • hpp://wiki.apache.org/hadoop/UnknownHost
• Get name resolu:on working BEFORE enabling nscd!
46 © Cloudera, Inc. All rights reserved.
Summary Now is the appropriate :me to take out your camera phone.
47 © Cloudera, Inc. All rights reserved.
A white background is supposedly beper for prin:ng. (who prints things anymore?)
48 © Cloudera, Inc. All rights reserved.
A white background is supposedly beper for prin:ng. (but makes for very pale slides)
49 © Cloudera, Inc. All rights reserved.
Summary
1. disable vm.swappiness 2. data disks: mount with noatime op:on 3. data disks: disable root reserve space 4. enable nscd 5. increase file handle limits 6. use dedicated OS/logging disk 7. sane name resolu:on
hpp://:ny.cloudera.com/7steps
50 © Cloudera, Inc. All rights reserved.
Recommended reading
• Hadoop Opera:ons hpp://amzn.to/1ydMrLf
51 © Cloudera, Inc. All rights reserved.
Ques:ons? Preferably related to the talk…
52 © Cloudera, Inc. All rights reserved.
Thanks! Alex Moundalexis| @technmsg
53 © Cloudera, Inc. All rights reserved.
8. Bonus Round Because we have enough :me (or I talked really fast)…
54 © Cloudera, Inc. All rights reserved.
Other things to check
• Disk IO • hdparm • # hdparm -‐Tt /dev/sdc • Looking for at least 70 MB/s from 7200 RPM disks • Slower could indicate a failing drive, disk controller, array, etc.
• dd • hpp://romanrm.ru/en/dd-‐benchmark
55 © Cloudera, Inc. All rights reserved.
Other things to check
• Disable Red Hat Transparent Huge Pages (RH6+ un:l 6.5) • Can reduce elevated CPU usage • In rc.local:
echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled
• Reference: Linux 6 Transparent Huge Pages and Hadoop Workloads, hpp://goo.gl/WSF2qC
56 © Cloudera, Inc. All rights reserved.
Other things to check
• Enable Jumbo Frames • Only if your network infrastructure supports it! • Can easily (and arguably) boost throughput by 10-‐20%
57 © Cloudera, Inc. All rights reserved.
Other things to check
• Enable Jumbo Frames • Only if your network infrastructure supports it! • Can easily (and arguably) boost throughput by 10-‐20%
• Monitor and Chart Everything • How else will you know what’s happening? • Nagios • Ganglia
58 © Cloudera, Inc. All rights reserved.
Ques:ons? Preferably related to the talk…
59 © Cloudera, Inc. All rights reserved.
Thanks! Alex Moundalexis| @technmsg