hadoop tuning guide-version5

Upload: anidatta

Post on 03-Jun-2018

236 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 Hadoop Tuning Guide-Version5

    1/22

    R 1.0 O 2012

    1

    H P T G

  • 8/11/2019 Hadoop Tuning Guide-Version5

    2/22

    R 1.0 O 2012

    2

    2012 , . .

    T A MD , I . ( AMD ) . AMD

    . T

    . N , , , , . E

    AMD S T C S , AMD , ,

    , , , ,

    .

    AMD , , ,

    , AMD

    , , .AMD

    .

    A U R L (URL ) I .

    AMD, AMD A , , AMD A , AMD O , 3DN !, AMD V AMD A M D , I .

    L L T .SPEC, SPEC SPEC S P E C .O

    .

  • 8/11/2019 Hadoop Tuning Guide-Version5

    3/22

    R 1.0 O 2012

    3

    REVISION HISTORY ............................................................................................................................................... 4

    1.0 INTRODUCTION ................................................................................................................................................ 5

    1.1 INTENDED AUDIENCE................................................................................................................................ 5

    1.2 CHALLENGES INVOLVED IN TUNING HADOOP .......................................................................................... 5 1.3 MONITORING AND PROFILING TOOLS ...................................................................................................... 6 1.4 METHODOLOGY AND EXPERIMENT SETUP ............................................................................................... 6

    2.0 GETTING STARTED ........................................................................................................................................... 7

    2.1 CORRECTNESS OF HARDWARE SETUP ...................................................................................................... 7 2.2 UPGRADING SOFTWARE COMPONENTS................................................................................................... 8 2.3 PERFORMING STRESS TESTS ..................................................................................................................... 8 2.4 ENSURING HADOOP JOB COMPLETION .................................................................................................... 9

    2.4.1 OS PARAMETERS ...................................................................................................................... 9 2.4.2 HADOOP PARAMETERS ............................................................................................................ 9

    3.0 PERFORMANCE TUNING ................................................................................................................................ 10

    3.1 HADOOP CONFIGURATION TUNING ....................................................................................................... 10

    3.1.1 BASELINE CONFIGURATION ................................................................................................... 10 3.1.2 DATA DISK SCALING ............................................................................................................... 11 3.1.3 COMPRESSION ....................................................................................................................... 11 3.1.4 JVM REUSE POLICY................................................................................................................. 12 3.1.5 HDFS BLOCK SIZE ................................................................................................................... 12 3.1.6 MAP SIDE SPILLS .................................................................................................................... 13 3.1.7 COPY/SHUFFLE PHASE TUNING ............................................................................................. 14 3.1.8 REDUCE SIDE SPILLS ............................................................................................................... 14 3.1.9 POTENTIAL LIMITATIONS ....................................................................................................... 15

    3.2 JVM CONFIGURATION TUNING............................................................................................................... 16

    3.2.1 JVM FLAGS ............................................................................................................................. 16 3.2.2 JVM GARBAGE COLLECTION .................................................................................................. 16

    3.3 OS CONFIGURATION TUNING ................................................................................................................. 17

    3.3.1 TRANSPARENT HUGE PAGES ................................................................................................. 17 3.3.2 FILE SYSTEM CHOICE AND ATTRIBUTES ................................................................................. 17 3.3.3 IO SCHEDULER CHOICE .......................................................................................................... 18

    4.0 FINAL WORDS AND CONCLUSION .................................................................................................................. 19

    5.0 RESOURCES .................................................................................................................................................... 20

    6.0 REFERENCES ................................................................................................................................................... 21

  • 8/11/2019 Hadoop Tuning Guide-Version5

    4/22

    R 1.0 O 2012

    4

    1.0 O 2012 S J I

  • 8/11/2019 Hadoop Tuning Guide-Version5

    5/22

    R 1.0 O 2012

    5

    1.0

    H 1 J M R . H M R

    60.2% 2011 2016 2 . F , H

    , , . UH H , H

    . I , H . U

    5.6X . W OS, JVM H .

    S 1 I , H , , H

    .

    S 2

    H . S , , S , OS H

    H .

    S 3 H , JVM, OS , H , T S

    .

    S 4 . S 5 S 6 .

    1.1 T H

    H . S A H

    .

    T H .

    1.2

    W H ?

    H S . B

    H . P H H , JVM, OS, , ,

    BIOS . H . O

    H . S . T ,

  • 8/11/2019 Hadoop Tuning Guide-Version5

    6/22

    R 1.0 O 2012

    6

    . A , . O , H

    .

    1.3

    W H ?

    G N :G 3 N 4 CPU , ,

    . T H : H

    . L OS dstat , vmstat , iostat , netstat free

    . T H .

    H V 5 A H H

    H . J P : H H ,AMD C A 6 O S

    S : P A 7 J H H .

    S L 8 OP 9 .

    1.4

    T H .

    3 AMD OTM . S 10 11 12 . T

    H H

    T T S 13 1TB T G . N H

    . T

    .

    T H :

    4 (D N T T ), 1 (N N , S N N J T ): 2 /16 AMD O TM 6386 SE 2.8GH

    16 8 GB DDR3 1600 MH ECC RAM 8 T MK2002TSKB 2TB @7200 SATA 1 LSI M RAID SAS 9265 8 RAID 1 1G E R H E L S 6.3 (S ) 2.6.32 279.5.2. 6. 86 64 O J (TM) SE R E ( 1.7.0 05 06) J H S (TM) 64 B S

    ( 23.1 03, ) C D H (CDH) 4.0.1 ( 2.0.0 1 4.0.1)

    :// . . / / /MAPREDUCE 2374 .

  • 8/11/2019 Hadoop Tuning Guide-Version5

    7/22

    R 1.0 O 2012

    7

    2.0

    T H , 10 11 H ,

    H . H H / S

    H ?

    B , H :

    V . U . P / . T OS H H

    N H N

    . I .

    2.1

    A , BIOS, , OS DIMM , , .

    T . T

    H .

    T BIOS. B BIOS

    BIOS . I BIOS .

    S RAID/ , , , DIMM . W

    S 2.3. U . O IO

    H . F .

    S AMD O TM 6200 S P L T G 14

    . S , DIMM , DIMM , NUMA

    OS STREAM 15 . I STREAM O TM 6200 S P L T G 14 .

    T , H

  • 8/11/2019 Hadoop Tuning Guide-Version5

    8/22

    R 1.0 O 2012

    8

    F .

    U , BIOS, .

    P .

    2.2 T L OS , L ,

    , JDK H/ H H . T

    H H .

    P , L H . D

    OS . T L

    H

    T H . A . I ,

    ISA W H JVM

    H .

    O , L , , ,

    . A , L S 16 LZO 17 /

    .

    T ,

    U L / H .

    U H .

    U JVM H .

    2.3

    S H H . T /

    . W H H . W

    . F :

    STREAM NUMA .

    IO IO .

  • 8/11/2019 Hadoop Tuning Guide-Version5

    9/22

    R 1.0 O 2012

    9

    N H DFSIO, NNB MRB

    H . T H . O SPEC , SPEC SPEC

    .

    2.4 D H OS H

    M /R / H . I T S ,

    :

    2.4.1

    T (FD) ulimit FD H . T

    . W 32768 I H

    S net.core.somaxconn L . T 128. I ,

    . W 1024 .

    2.4.2

    D H M /R mapred.task.timeout H

    . . T 600 . I , . N

    , / H . T

    . I java.net.SocketTimeoutException

    / dfs.socket.timeout dfs.datanode.socket.write.timeout .

    . A .

  • 8/11/2019 Hadoop Tuning Guide-Version5

    10/22

    R 1.0 O 2012

    10

    3.0

    O H . C

    H H , JVM, OS H . I , T S

    H . T

    . W . T

    H , JVM, OS. N H H S 1.3.

    3.1

    I H

    .

    A T S . I M T S 1TB IO 1TB IO

    IO . T R IO 1TB IO , IO

    R . O , T S IO . D H

    .

    3.1.1

    T M /R J H. T . T CPU

    T J H , M /R , / , IO

    . A :

    A. S

    B. C M R M CPU R

    C. C J M R JVM

    M /R .

    T H mapred.map.tasks ,mapred.tasktracker.map.tasks.maximum, mapred.reduce.tasks,mapred.tasktracker.reduce.tasks.maximum, mapred.map.child.java.opts , mapred.reduce.child.java.opts . . U ,

    , :

    A. U 4 D N

  • 8/11/2019 Hadoop Tuning Guide-Version5

    11/22

    R 1.0 O 2012

    11

    B. A 2 M 1 R C. A 1GB J M R JVM .

    T 3 JVM 3GB . W 4GB RAM . T 1GB OS . U

    . N T S . B H .

    3.1.2

    T . O mapred.local.dir .

    dfs.name.dir dfs.data.dir . H. G IO T S

    . F 1 T .

    Figure 1: TeraSort performance scaling with number of data disks

    3.1.3

    H 3 , M R. I

    . S T S R

    . T M . E IO CPU

    . T , / H . T

    mapred.compress.map.output, mapred.map.output.compression.codec,mapred.output.compress, mapred.output.compression.type,

    100.00 %

    77.40 %

    53.61 %

    0.00

    20.00

    40.00

    60.00

    80.00

    100.00

    120.00

    4

    5

    7

  • 8/11/2019 Hadoop Tuning Guide-Version5

    12/22

    R 1.0 O 2012

    12

    mapred.output.compression.codec . . T CPU CPU

    IO /

    T F 2 M T S .

    Figure 2: Effect of Map output compression using different codecs on TeraSort performance

    W 28% S . T S F . S (1.5%) LZO

    .

    3.1.4 H mapred.job.reuse.jvm.num.tasks

    M /R JVM 1 . T . 1 JVM

    . S 1 JVM . E JVM JVM

    JVM J JIT . JVM

    . W 2% JVM .

    3.1.5

    E M . Tmapred.min.split.size ( . ),dfs.block.size ( . ) mapred.max.split.size ( . )

    . T M H . F

    T S HDFS dfs.block.size . I H M

    HDFS . R M

    100.00 %

    64.14 % 64.16%

    0.00

    20.00

    40.00

    60.00

    80.00

    100.00

    120.00

    N C

    S C

    LZO C

  • 8/11/2019 Hadoop Tuning Guide-Version5

    13/22

    R 1.0 O 2012

    13

    M JVM . I R . L M . I

    M M . NM HDFS

    . F 3 . W 256M .

    Figure 3: TeraSort performance comparison with different HDFS block sizes

    3.1.6

    W M . T

    M JVM . T 100 MB. T io.sort.mb ( . ) . A . B 0.05 (5%) io.sort.mb

    5MB. T io.sort.record.percent . . E 16 . T 327680

    C . T

    io.sort.spill.percent . 0.8 (80%) .

    S M ( ) . T M 304 . I J M

    . I

    . . .. A M M

    S R J T M. I M

    A M

    . Tio.sort.mb io.sort.spill.percent 0.99

    100.00 %

    87.23 %80.37 % 82.06 %

    0.00

    20.00

    40.00

    60.00

    80.00

    100.00

    120.00

    64MB

    128MB

    256MB

    384MB

  • 8/11/2019 Hadoop Tuning Guide-Version5

    14/22

    R 1.0 O 2012

    14

    M N

    M . F , HDFS 256MB 100

    io.sort.mb 316MB, io.sort.record.percent 0.162 (16.2% 316MB) io.sort.spill.percent 0.99 (99% ) M . W

    2.64% M

    3.1.7 /

    I R M . T

    :

    T mapred.reduce.parallel.copies . 5 . T

    . T T T

    tasktracker.http.threads 40 . T T T . O .

    C dfs.datanode.handler.count ( . ),dfs.namenode.handler.count ( . ) mapred.job.tracker.handler.count ( . )

    . R . T

    R . I R .

    N . U

    .

    3.1.8

    T R H . R IO H

    . A , M / , HDFS. T

    R H , J R JVM M JVM .

    O M , M R T T . T M

    T T . A , mapred.job.shuffle.input.buffer.percent

    . , , M . O , M . T mapred.job.shuffle.input.buffer.percent 0.70 . T

    70% R JVM M . W ( mapred.job.shuffle.merge.percent

    . 0.66 ) M O R R JVM mapred.job.reduce.input.buffer.percent . M

    R .

  • 8/11/2019 Hadoop Tuning Guide-Version5

    15/22

    R 1.0 O 2012

    15

    mapred.job.reduce.input.buffer.percent 0.0 R JVM .

    E , . . . . . mapred.job.reduce.input.buffer.percent IO

    R . I R T S

    J . I mapred.job.reduce.input.buffer.percent 1.0

    W . . . . . mapred.job.reduce.input.buffer.percent 0.8. I R T S

    . A 2 M . 1 M M J

    R JVM . L R .

    W 10% R . F 4 .

    Figure 4: Effect of tuning Reduce phase Hadoop parameters on TeraSort

    3.1.9

    F , H / . F , CDH 4.0.1

    M /R . T . F , R

    R R . A , mapred.max.tracker.failures

    . T T . T T T 4 T T

    . T mapred.max.tracker .failures

    H . C mapred.max.tracker.failures . O ,

    100.00 %90.16 %

    0.00

    20.00

    40.00

    60.00

    80.00

    100.00

    120.00

    W R

    W R

  • 8/11/2019 Hadoop Tuning Guide-Version5

    16/22

    R 1.0 O 2012

    16

    H . N .

    T .

    3.2 O H

    JVM.

    3.2.1

    JVM JVM . T

    JVM . F :

    A O T JVM . T JVM

    3.4% . N

    A O O JDK7 U 5. U C O C O O P

    64 JVM J O JVM . I JVM

    , . W 1% T S . N U C O O JDK 7 U 5.

    U B L B O H S JDK . W 1% T S

    . N U B L O JDK 7 U

    3.2.2

    G M R JVM JVM . H , GC

    GC . O

    GC M R JVM . R GC .

    F 5 JVM .

  • 8/11/2019 Hadoop Tuning Guide-Version5

    17/22

    R 1.0 O 2012

    17

    Figure 5: Effect of JVM command-line options tuning on TeraSort

    3.3

    I OS H

    3.3.1

    T (THP) RHEL 6.2 . H ,

    THP CPU

    THP . T . W 66% T S THP . THP H . S K I W A CDH4

    4.0.1 19 . A THP .

    3.3.2

    T (FS) L . G FS

    FS H IO 6.3 EXT4 FS

    EXT3 FS.

    I , . T (

    FS . W 29% T FS .

    100.00 % 96.74 %

    0.00

    20.00

    40.00

    60.00

    80.00

    100.00

    120.00

    W JVM

    W JVM

  • 8/11/2019 Hadoop Tuning Guide-Version5

    18/22

    R 1.0 O 2012

    18

    3.3.3

    M L 4 IO CFQ, , D IO , IO

    IO . T IO L . F U 11.04 IO

    RHEL 6.3 CFQ . A :// . . / . ? =2188323 15% CFQ

    .

    C F 6 OS

    Figure 6: Effect of OS configuration tuning on TeraSort performance

    100.00 %

    77.39 %

    0.00

    20.00

    40.00

    60.00

    80.00

    100.00

    120.00

    W OS

    W OS

  • 8/11/2019 Hadoop Tuning Guide-Version5

    19/22

    R 1.0 O 2012

    19

    4.0

    B D H . T H . T

    H H

    . U H

    H .

    I H . O T S 5.6X

    (4 F 1) . W 3X ( F 1) .

    F 7 :

    Figure 7: Total improvements in TeraSort performance through configuration tuning

    S H .

    5.60 X

    3.00 X

    1.00 X

    0.00

    1.00

    2.00

    3.00

    4.00

    5.00

    6.00

    7.00

    B 4

    B 7

    T

  • 8/11/2019 Hadoop Tuning Guide-Version5

    20/22

    R 1.0 O 2012

    20

    5.0

    AMD D C J Z ::// . . /R / / /P / .

    AMD D T : :// . . / AMD O TM 6200 S P L T G ::// . . / /2012/04/25/

    %E2%80%9C %E2%80%9D / 5 5% H :

    :// . . / /2011/10/20/5 5 / J G C C T G A H T S

    :// . . /R / / /P / . A :

    :// . . / . ? =2188323 A H P M B P

    : :// . . / /2011/07/12/%E2%80%93 /

    M R E T ::// . . / /2012/06/06/ %E2%80%93/

    M R O M R E ::// . . / /2012/05/29/

    / O J A P S B P :

    :// . . / /2012/04/25/%E2%80%9C %E2%80%9D /

  • 8/11/2019 Hadoop Tuning Guide-Version5

    21/22

    R 1.0 O 2012

    21

    6.0

    [1] "W A H ," O . A : :// . . /. A 07 O 201

    [2] "IDC R F W H M R E S F , S G WC A T T D US23471212," O . A :

    :// . . / . ? I = US23471212. A 07 O 2012 .

    [3] "G M S ," O . A : :// . . /. A 07 O2012 .

    [4] "N T I S IT I M ," O . A : :// .A 07 O 2012 .

    [5] "V G ," O . A : :// . . / / / / . .A 07 O 2012 .

    [6] "AMD C A P A AMD D C ," O . A ::// . . / / /C A / / . . A 07 O 2012 .

    [7] "O S S 12.2: P A O S S 12.2: P AO . A : :// . . / /E18659 01/ /821 1379/ . . A 07 O

    2012 .

    [8] "M P P W ," O . A : :// . . . / . /M P . AO 2012 .

    [9] "A OP ," O . A : :// . . / /. A 07 O

    [10] T. W , H : T D G , S : O'R M , I ., 2010.

    [11] "7 T I M R P A H E C ," OA : :// . . / /2009/12/7 /.

    A 07 O 2012 .

    [12] "D B : S H P 1: M I ," O . A ::// . . /2011/01/ 1 . . A 07 O

    2012 .

    [13] O. O'M . O . A : :// . /Y H . . A 07 O 201

  • 8/11/2019 Hadoop Tuning Guide-Version5

    22/22

    R 1.0 O 2012

    [14] "D G M AMD D C ," O . A ::// . . /A /51803A O L T G SCREEN. . A 07 O

    2012 .

    [15] "MEMORY BANDWIDTH: STREAM BENCHMARK PERFORMANCE RESULTS," O . A ::// . . . / /. A 09 O 2012 .

    [16] "5 5% H AMD D C ," O . A:// . . / /2011/10/20/5 5 /.

    A 09 O 2012 .

    [17] " . : LZO ," O . A ::// . . / / /. A 09 O 2012 .

    [18] " #MAPREDUCE 2374 "T F B " MR ASF JIRA," O . A ::// . . / / /MAPREDUCE 2374. A 09 O 2012 .

    [19] "CDH4 R N C S ," O . A ::// . . / /CDH4DOC/CDH4+R +N . A 09 O 2012 .