hadoop configuration tuning best practices - admin · pdf fileshrinivas joshi, advanced micro...

1
Shrinivas Joshi, Advanced Micro Devices, Inc. {[email protected]} Introduction Users who have deployed and tried tuning their Hadoop clusters for the first time will certainly attest to the fact that optimizing Hadoop clusters is a daunting task. Apart from the nature and implementation of Hadoop jobs, hardware, network infrastructure, OS, JVM, and Hadoop configuration properties all have a significant impact on performance and scalability of Hadoop workloads. Using TeraSort as a reference workload, this poster attempts to educate the audience on challenges involved in tuning performance of Hadoop setup, different tools available for monitoring and diagnosing performance bottlenecks, tuning best practices, empirical data on effect of various tunings on performance, and some future directions. Challenges Hadoop is a large and complicated framework involving a number of entities interacting with each other across multiple hardware systems. Performance of Hadoop jobs is sensitive to every component of the cluster stack: Hadoop configuration, JVM, OS, network infrastructure, underlying hardware, and possibly BIOS settings. Hadoop supports a large number of configuration properties and a good chunk of these can potentially impact performance. As with any large software system, diagnosing performance issues is a complicated task. Tools Ganglia and Nagios effective in monitoring cluster-wide statistics. Hadoop Vaidya offers guidelines on mitigating framework-level bottlenecks. Hadoop Core offers simple and insightful information on performance- affecting events. AMD CodeAnalyst TM , Oracle® Solaris Studio and Intel® VTune TM profilers for in-depth source-level analysis. AMD MultEvent, Linux perf, and OProfile platform-level profilers for in-depth analysis of hardware-level performance events. Tuning Highlights On ClusterA, we achieved a speed-up factor of 5.6X. On ClusterB, we achieved a speed-up factor of 2.2X. Conclusion & Future Direction Configuration tuning of all the components of Hadoop stack is an important exercise and can offer a huge performance payoff. Different Hadoop workloads will have different characteristics, so it is important to experiment with different tuning options. We want to perform a similar study in multi-tenant environments and in the cloud environment. We want to explore JVM and JDK optimizations targeting peculiar characteristics of Hadoop framework and jobs running on top of it. On 64-bit Oracle JDK 6 update 25, compressed pointers are ON by default. If you are using an older version of JDK and compressed pointers are disabled, experiment with enabling them. Compressed pointers reduce memory footprint. JVM Configuration Tuning Biased-locking feature of Oracle HotSpot JDK improves performance in scenarios with uncontended locks. Given the architecture of Hadoop framework, biased locking should generally improve performance. When all the CPU cores on the hardware are not fully utilized, the processor could be downgrading CPU frequency. Experiment with ACPI and other power-related BIOS options. BIOS Configuration Tuning Modern AMD processors support a feature called HT assist (a.k.a. probe filters). This feature reduces traffic on memory interconnects at the expense of some portion of L3 cache. Experiment with this feature. Try disabling it if your job is cache-sensitive. Native command queuing feature of modern hard drives helps improve I/O performance by optimizing drive head movement. Experiment with AHCI option in BIOS, which can be used to enable NCQ mode. On some AMD processor-based systems, HyperTransport TM link speed and width are dynamically adjusted to reduce power consumption. If memory bandwidth is a bottleneck, experiment with this option. Certain Linux distributions support EXT4 as the default file-system type. If you are using another type of file system, experiment with EXT4 file system. OS Configuration Tuning Using maximum possible map and reduce slots, identify the optimal number of disks that maximizes I/O bandwidth. Experiment with different HDFS block sizes. Identify heap usage and GC characteristics of framework processes and lock in their heap and GC settings. Hadoop Configuration Tuning Try to avoid or eliminate intermediate disk I/O operations on reduce side by tuning heap sizes of reduce JVMs. If the reduce functionality is not heap heavy, adjust map output buffer size. Tune framework-related resources such as task tracker threads, data node, and name node handler count. Start with biggest-payoff properties. Enable map output compression. Experiment with different codec choices. In our experience, LZO codec performed better across multiple workloads. Experiment with JVM reuse policy. This helps reduce disk & network I/O wait and increase CPU utilization. If memory is not a bottleneck, try to eliminate map-side spills by tuning io.sort.mb. At the least, reduce the number of spills. In case there are no spills, tune io.sort.spill.percent. This also helps reduce disk I/O wait time and increase CPU utilization. Best Practices Hadoop: Identify the right number of data disks your job requires. Observe Hadoop framework heap usage and GC patterns and lock in heap and GC JVM flags for these processes. Using default settings, start with Hadoop configuration tuning. Focus on properties shown to have a big impact, such as map/reduce output compression and JVM reuse policy. Avoid or eliminate map-side spills. Avoid or eliminate reduce-side disk I/O by tuning reduce JVM heap size, map output copier threads, merge factor, and tasktracker threads handling reduce-side requests. If reduce functionality is not heap- heavy, try to maximize the map output storage buffers. Try to maximize CPU usage by tuning number of map and reduce JVMs. Tune other framework-related components such as datanode and namenode handler count. JVM: Experiment with performance- affecting JVM flags such as biased locking, compressed pointers, AggressiveOpts, etc. Perform a thorough Java GC analysis and tune GC related flags. Operating System: Experiment with various OS-level tunings such as choice of FS, choice of I/O scheduler, and limits on OS resources. BIOS: Finally, verify if any default BIOS settings are negatively affecting performance of your Hadoop jobs. Cluster A 6 DNs, 1 NN: 2 chips/6 cores per chip: AMD Opteron TM 2435 @2.6GHz 16GB DDR2 800 RAM per node 6 x 1TB Samsung Spinpoint F3 @7200 Ubuntu 11.04 Server x64, Oracle JDK6 update 25 x64 Dataset size 64 GB Cluster B 5 DNs: 2 chips/4 cores per chip: AMD Opteron 2356 TM @2.3GHz 1 NN: 4 chips/4 cores per chip: AMD Opteron 8356 TM @ 2.6GHz 64GB DDR2 667 and DDR2 800 RAM 6 x 1TB Samsung Spinpoint F3 @7200 Ubuntu 11.04 Server x64, Oracle JDK6 update 25 x64 Dataset size 1 TB Experiment with AggressiveOpts flag. Experiment with UseCompressedStrings flag. Experiment with UseStringCache flag. Experiment with UseNUMA flag. Experiment with UseLargePages flag. Verify whether the JVM is running out of code cache. Increase code cache size if necessary. Perform detailed GC log analysis and tuning. By default, every file read operation triggers a disk write operation for maintaining last access time. Disable this logging using noatime, nodirtaime FS attributes. Experiment with other FS attribute tuning such as extent, flex_bg, barrier etc. Linux kernels support 4 different types of I/O schedulers CFQ, deadline, no-op, and anticipatory. Experiment with different choices of I/O scheduler, especially CFQ and deadline. 100.00% 85.00% 0 0.2 0.4 0.6 0.8 1 1.2 FS attributes Execution time Effect of noatime file system attribute FS noatime attribute disabled FS noatime attribute enabled 100.00% 85.00% 0 0.2 0.4 0.6 0.8 1 1.2 IO scheduler choices Execution time Effect of I/O scheduler choices "deadline" scheduler "CFQ" scheduler 100.00% 91.00% 0 0.2 0.4 0.6 0.8 1 1.2 File system type Execution time Effect of EXT4 file system Data disks using EXT3 FS Data disks using EXT4 FS 100.00% 97.00% 0 0.2 0.4 0.6 0.8 1 1.2 GC tuning Execution time Effect of GC tuning Without GC tuning With GC tuning 100.00% 81.80% 0 0.2 0.4 0.6 0.8 1 1.2 Map-side spill configurations Execution time Effect of eliminating map-side spills & tuning io.sort.factor Map-side spills occuring Map-side spills eliminated 100.00% 71.80% 0 0.2 0.4 0.6 0.8 1 1.2 Map output compression configuration Execution time Effect of enabling map compression using LZO Compression disabled Compression enabled 100.00% 94.00% 0 0.2 0.4 0.6 0.8 1 1.2 Reduce side buffer space configuration Execution time Effect of tuning reduce-side configurations Without reduce side tuning With reduce side tuning 100.00% 94.50% 0 0.2 0.4 0.6 0.8 1 1.2 Biased locking configuration Execution time Effect of biased locking Biased locking disabled Biased locking enabled 100.00% 96.80% 0 0.2 0.4 0.6 0.8 1 1.2 Compressed pointers configuration Execution time Effect of compressed pointers Compressed pointers disabled Compressed pointers enabled 100.00% 102.00% 0 0.2 0.4 0.6 0.8 1 1.2 Aggressive optimizations configuration Execution time Effect of JVM aggressive optimizations Without -XX:+AggressiveOpts flag With -XX:+AggressiveOpts flag Linux OS limits such as max open file descriptors and epoll limits can affect performance. Experiment with these limits. 100.00% 101.00% 0 0.2 0.4 0.6 0.8 1 1.2 Linux open file descriptors limit settings Execution time Effect of Linux OS resource settings Default open FD limit (1024) Open fd limit set to 16K 100.00% 99.00% 0 0.2 0.4 0.6 0.8 1 1.2 Power saving settings in BIOS Execution time Effect of power-saving settings in BIOS Power saving enabled in BIOS Power savings disabled in BIOS © 2011 Advanced Micro Devices, Inc. AMD, the AMD Arrow logo, AMD Opteron and combinations thereof are trademarks of Advanced Micro Devices, Inc. HyperTransport is a trademark of HyperTransport Technology Consortium. 100.00% 51.00% 43.00% 0 0.2 0.4 0.6 0.8 1 1.2 Number of data disks Execution time Scaling with number of data disks 1 disk 3 disks 5 disks 100.00% 98.00% 0 0.2 0.4 0.6 0.8 1 1.2 HyperTransport settings Execution time Effecting of tuning HyperTransport settings HyperTransport link speed and width set to "auto" HyperTransport link speed set to 2.2GHz and width set to 16 76.01 44.58 1.26 4.59 8.30 0.30 3.30 0 10 20 30 40 50 60 70 80 64 GB TeraSort on ClusterA 1TB TeraSort on ClusterB Total percentage gains as compared to the baseline execution time Cluster Configurations BIOS OS JVM Hadoop

Upload: hoanglien

Post on 08-Mar-2018

220 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Hadoop Configuration Tuning Best Practices - ADMIN · PDF fileShrinivas Joshi, Advanced Micro Devices, Inc. {shrinivas.joshi@amd.com} Introduction Users who have deployed and tried

Shrinivas Joshi, Advanced Micro Devices, Inc. {[email protected]}

Introduction

Users who have deployed and tried tuning their Hadoop clusters for the first time will certainly attest to the fact that optimizing Hadoop clusters is a daunting task.

Apart from the nature and implementation of Hadoop jobs, hardware, network infrastructure, OS, JVM, and Hadoop configuration properties all have a significant impact on performance and scalability of Hadoop workloads.

Using TeraSort as a reference workload, this poster attempts to educate the audience on challenges involved in tuning performance of Hadoop setup, different tools available for monitoring and diagnosing performance bottlenecks, tuning best practices, empirical data on effect of various tunings on performance, and some future directions.

Challenges

• Hadoop is a large and complicated framework involving a number of entities interacting with each other across multiple hardware systems. • Performance of Hadoop jobs is sensitive to every component of the cluster stack: Hadoop configuration, JVM, OS, network infrastructure, underlying hardware, and possibly BIOS settings.• Hadoop supports a large number of configuration properties and a good chunk of these can potentially impact performance.• As with any large software system, diagnosing performance issues is a complicated task.

Tools

• Ganglia and Nagios – effective in monitoring cluster-wide statistics.• Hadoop Vaidya – offers guidelines on mitigating framework-level bottlenecks.• Hadoop Core – offers simple and insightful information on performance-affecting events.• AMD CodeAnalystTM, Oracle® Solaris Studio and Intel® VTuneTM – profilers for in-depth source-level analysis.• AMD MultEvent, Linux perf, and OProfile – platform-level profilers for in-depth analysis of hardware-level performance events.

Tuning Highlights• On ClusterA, we achieved a speed-upfactor of 5.6X.• On ClusterB, we achieved a speed-upfactor of 2.2X.

Conclusion & Future Direction• Configuration tuning of all the components of Hadoop stack is an important exercise and can offer a huge performance payoff.• Different Hadoop workloads will have different characteristics, so it is important to experiment with different tuning options.• We want to perform a similar study in multi-tenant environments and in the cloud environment.• We want to explore JVM and JDK optimizations targeting peculiar characteristics of Hadoop framework and jobs running on top of it.

• On 64-bit Oracle JDK 6 update 25, compressed pointers are ON by default. If you are using an older version of JDK and compressed pointers are disabled, experiment with enabling them.• Compressed pointers reduce memory footprint.

JVM Configuration Tuning• Biased-locking feature of Oracle HotSpot JDK improves performance in scenarios with uncontended locks.• Given the architecture of Hadoop framework, biased locking should generally improve performance.

• When all the CPU cores on the hardware are not fully utilized, the processor could be downgrading CPU frequency.• Experiment with ACPI and other power-related BIOS options.

BIOS Configuration Tuning

• Modern AMD processors support a feature called HT assist (a.k.a. probe filters). This feature reduces traffic on memory interconnects at the expense of some portion of L3 cache.• Experiment with this feature. Try disabling it if your job is cache-sensitive.

• Native command queuing feature of modern hard drives helps improve I/O performance by optimizing drive head movement.• Experiment with AHCI option in BIOS, which can be used to enable NCQ mode.

• On some AMD processor-based systems, HyperTransportTM link speed and width are dynamically adjusted to reduce power consumption.• If memory bandwidth is a bottleneck, experiment with this option.

• Certain Linux distributions support EXT4 as the default file-system type.• If you are using another type of file system, experiment with EXT4 file system.

OS Configuration Tuning

• Using maximum possible map and reduce slots, identify the optimal number of disks that maximizes I/O bandwidth.• Experiment with different HDFS block sizes. • Identify heap usage and GC characteristics of framework processes and lock in their heap and GC settings.

Hadoop Configuration Tuning

• Try to avoid or eliminate intermediate disk I/O operations on reduce side by tuning heap sizes of reduce JVMs.• If the reduce functionality is not heap heavy, adjust map output buffer size. • Tune framework-related resources such as task tracker threads, data node, and name node handler count.

• Start with biggest-payoff properties.• Enable map output compression.• Experiment with different codec choices. In our experience, LZO codec performed better across multiple workloads.• Experiment with JVM reuse policy. • This helps reduce disk & network I/O wait and increase CPU utilization.

• If memory is not a bottleneck, try to eliminate map-side spills by tuning io.sort.mb. At the least, reduce the number of spills.• In case there are no spills, tune io.sort.spill.percent.•This also helps reduce disk I/O wait time and increase CPU utilization.

Best PracticesHadoop:• Identify the right number of data disks your job requires.• Observe Hadoop framework heap usage and GC patterns and lock in heap and GC JVM flags for these processes. • Using default settings, start with Hadoop configuration tuning.• Focus on properties shown to have a big impact, such as map/reduce output compression and JVM reuse policy.• Avoid or eliminate map-side spills.• Avoid or eliminate reduce-side disk I/O by tuning reduce JVM heap size, map output copier threads, merge factor, and tasktracker threads handling reduce-side requests.• If reduce functionality is not heap-heavy, try to maximize the map output storage buffers.• Try to maximize CPU usage by tuning number of map and reduce JVMs.• Tune other framework-related components such as datanode and namenode handler count.

JVM:• Experiment with performance-affecting JVM flags such as biased locking, compressed pointers, AggressiveOpts, etc.• Perform a thorough Java GC analysis and tune GC related flags.

Operating System:• Experiment with various OS-level tunings such as choice of FS, choice of I/O scheduler, and limits on OS resources.

BIOS:• Finally, verify if any default BIOS settings are negatively affecting performance of your Hadoop jobs.

Cluster A• 6 DNs, 1 NN: 2 chips/6 cores per chip: AMD OpteronTM 2435 @2.6GHz• 16GB DDR2 800 RAM per node• 6 x 1TB Samsung Spinpoint F3 @7200• Ubuntu 11.04 Server x64, Oracle JDK6 update 25 x64• Dataset size – 64 GB

Cluster B• 5 DNs: 2 chips/4 cores per chip: AMD Opteron 2356TM @2.3GHz• 1 NN: 4 chips/4 cores per chip: AMD Opteron 8356TM @ 2.6GHz• 64GB DDR2 667 and DDR2 800 RAM • 6 x 1TB Samsung Spinpoint F3 @7200• Ubuntu 11.04 Server x64, Oracle JDK6 update 25 x64• Dataset size – 1 TB

• Experiment with AggressiveOpts flag.• Experiment with UseCompressedStringsflag.• Experiment with UseStringCache flag.• Experiment with UseNUMA flag.•Experiment with UseLargePages flag.

• Verify whether the JVM is running out of code cache. Increase code cache size if necessary.• Perform detailed GC log analysis and tuning.

• By default, every file read operation triggers a disk write operation for maintaining last access time.• Disable this logging using noatime, nodirtaime FS attributes.• Experiment with other FS attribute tuning such as extent, flex_bg, barrier etc.

• Linux kernels support 4 different types of I/O schedulers – CFQ, deadline, no-op, and anticipatory.• Experiment with different choices of I/O scheduler, especially CFQ and deadline.

100.00%

85.00%

0

0.2

0.4

0.6

0.8

1

1.2

FS attributes

Ex

ecu

tio

n t

ime

Effect of noatime file system attribute

FS noatime attribute disabled

FS noatime attribute enabled

100.00%

85.00%

0

0.2

0.4

0.6

0.8

1

1.2

IO scheduler choices

Ex

ecu

tio

n t

ime

Effect of I/O scheduler choices

"deadline" scheduler

"CFQ" scheduler

100.00%

91.00%

0

0.2

0.4

0.6

0.8

1

1.2

File system type

Ex

ecu

tio

n t

ime

Effect of EXT4 file system

Data disks using EXT3 FS

Data disks using EXT4 FS

100.00%97.00%

0

0.2

0.4

0.6

0.8

1

1.2

GC tuning

Ex

ecu

tio

n t

ime

Effect of GC tuning

Without GC tuning

With GC tuning

100.00%

81.80%

0

0.2

0.4

0.6

0.8

1

1.2

Map-side spill configurations

Ex

ecu

tio

n t

ime

Effect of eliminating map-side spills & tuning io.sort.factor

Map-side spills occuring

Map-side spills eliminated

100.00%

71.80%

0

0.2

0.4

0.6

0.8

1

1.2

Map output compression configuration

Ex

ecu

tio

n t

ime

Effect of enabling map compression using LZO

Compression disabled

Compression enabled

100.00%94.00%

0

0.2

0.4

0.6

0.8

1

1.2

Reduce side buffer space configuration

Ex

ecu

tio

n t

ime

Effect of tuning reduce-side configurations

Without reduce side tuning

With reduce side tuning

100.00%94.50%

0

0.2

0.4

0.6

0.8

1

1.2

Biased locking configuration

Ex

ecu

tio

n t

ime

Effect of biased locking

Biased locking disabled

Biased locking enabled

100.00%96.80%

0

0.2

0.4

0.6

0.8

1

1.2

Compressed pointers configuration

Ex

ecu

tio

n t

ime

Effect of compressed pointers

Compressed pointers disabled

Compressed pointers enabled

100.00% 102.00%

0

0.2

0.4

0.6

0.8

1

1.2

Aggressive optimizations configuration

Ex

ecu

tio

n t

ime

Effect of JVM aggressive optimizations

Without -XX:+AggressiveOpts flag

With -XX:+AggressiveOpts flag

• Linux OS limits such as max open file descriptors and epoll limits can affect performance.• Experiment with these limits.

100.00% 101.00%

0

0.2

0.4

0.6

0.8

1

1.2

Linux open file descriptors limit settings

Ex

ecu

tio

n t

ime

Effect of Linux OS resource settings

Default open FD limit (1024)

Open fd limit set to 16K

100.00% 99.00%

0

0.2

0.4

0.6

0.8

1

1.2

Power saving settings in BIOS

Ex

ecu

tio

n t

ime

Effect of power-saving settings in BIOS

Power saving enabled in BIOS

Power savings disabled in BIOS

© 2011 Advanced Micro Devices, Inc. AMD, the AMD Arrow logo, AMD Opteron and combinations thereof are trademarks of Advanced Micro Devices, Inc. HyperTransport is a trademark of HyperTransport Technology Consortium.

100.00%

51.00%

43.00%

0

0.2

0.4

0.6

0.8

1

1.2

Number of data disks

Ex

ecu

tio

n t

ime

Scaling with number of data disks

1 disk

3 disks

5 disks

100.00% 98.00%

0

0.2

0.4

0.6

0.8

1

1.2

HyperTransport settings

Ex

ecu

tio

n t

ime

Effecting of tuning HyperTransport settings

HyperTransport link speed and width set to "auto"

HyperTransport link speed set to 2.2GHz and width set to 16

76.01

44.58

1.26

4.59

8.30

0.30

3.30

0

10

20

30

40

50

60

70

80

64 GB TeraSort on ClusterA 1TB TeraSort on ClusterB

To

tal

per

cen

tag

e g

ain

s as

co

mp

ared

to

th

e b

asel

ine

exec

uti

on

tim

e

Cluster Configurations

BIOS

OS

JVM

Hadoop