hadoop distributions: bottlenecks and tuning

27
Hadoop distributions. Bottlenecks and tuning. Diomin Aliaksey R&D 2014, Minsk

Upload: altoros

Post on 10-May-2015

437 views

Category:

Technology


0 download

DESCRIPTION

This presentation by Alexey Diomin, R&D Engineer at Altoros, explains how to spot performance bottlenecks in Hadoop and overviews five approaches to eliminating them.

TRANSCRIPT

Page 1: Hadoop Distributions: Bottlenecks and Tuning

Hadoop distributions. Bottlenecks and tuning.

Diomin AliakseyR&D

2014, Minsk

Page 2: Hadoop Distributions: Bottlenecks and Tuning
Page 3: Hadoop Distributions: Bottlenecks and Tuning

3

Hadoop Matrix

OpenSource Monitoring Target Group

Apache Hadoop Yes X Developers

Cloudera Yes Good All

Hortonworks Yes Good All

MapR No Bad Enterprise

PivotalHD No Bad Enterprise

Page 4: Hadoop Distributions: Bottlenecks and Tuning

4

How to find the bottleneck?

Page 5: Hadoop Distributions: Bottlenecks and Tuning

5

Monitoring & Logs

Page 6: Hadoop Distributions: Bottlenecks and Tuning

6

Brain

Page 7: Hadoop Distributions: Bottlenecks and Tuning

All stages

Page 8: Hadoop Distributions: Bottlenecks and Tuning

8

Map stage

Page 9: Hadoop Distributions: Bottlenecks and Tuning

9

Fetch stage

Page 10: Hadoop Distributions: Bottlenecks and Tuning

10

Merge stage

Page 11: Hadoop Distributions: Bottlenecks and Tuning

11

All stages

Page 12: Hadoop Distributions: Bottlenecks and Tuning

12

All stages

Page 13: Hadoop Distributions: Bottlenecks and Tuning

13

1. Increase size of cluster

2. Increase input block size

3. Increase buffer size

The most popular approaches

Page 14: Hadoop Distributions: Bottlenecks and Tuning

14

1. Increase size of cluster

2. Increase input block size

3. Increase buffer size

Popular approach

Page 15: Hadoop Distributions: Bottlenecks and Tuning

15

Small cluster, slow tasks

Page 16: Hadoop Distributions: Bottlenecks and Tuning

16

We need more gold ……

Page 17: Hadoop Distributions: Bottlenecks and Tuning

17

Large cluster, slow tasks

Page 18: Hadoop Distributions: Bottlenecks and Tuning

18

1. Increase size of cluster

2. Increase input block size

3. Increase buffer size

Popular approach

Page 19: Hadoop Distributions: Bottlenecks and Tuning

19

Increase input block size

Page 20: Hadoop Distributions: Bottlenecks and Tuning

20

1. Increase size of cluster

2. Increase input block size

3. Increase buffer size

Popular approach

Page 21: Hadoop Distributions: Bottlenecks and Tuning

21

1. Compression

Other techniques

Page 22: Hadoop Distributions: Bottlenecks and Tuning

22

1. Compression

2. Combiner

Other techniques

Page 23: Hadoop Distributions: Bottlenecks and Tuning

23

Wordcount

Reduce function as Combine

combine 1: <a, 1> <b, 1> <a, 1> => <a, 2> <b, 1>

combine 2: <a, 1> <b, 1> => <a, 1> <b, 1>

Reduce: <a, {1, 2}> <b, {1, 1}> => <a, 3> <b, 2>

Combiner

Page 24: Hadoop Distributions: Bottlenecks and Tuning

24

Mean

combine 1: <k,40> <k,30> <k,20> => <k, 30>

combine 2: <k,2> <k,8> => <k, 5>

Reduce: <k, {30, 5}> => <k, 17.5>

Combiner

Page 25: Hadoop Distributions: Bottlenecks and Tuning

25

Mean

combine 1: <k,40> <k,30> <k,20> => <k, 30>

combine 2: <k,2> <k,8> => <k, 5>

Reduce: <k, {30, 5}> => <k, 17.5>

(40 + 30 + 20 + 2 + 8)/5 = 17.5

Combiner

Page 26: Hadoop Distributions: Bottlenecks and Tuning

26

Mean

combine 1:

<k,<40,1>> <k,<30,1>>, <k,<20,1>> => <k, <90,3> >

combine 2:

<k,<2,1>> <k, <8,1>> => <k, <10, 2> >

Reduce: <k, {<90,3>, <10,2>} > => <k, 20>

Combiner

Page 27: Hadoop Distributions: Bottlenecks and Tuning

27