performance issues on hadoop clusters

Performance Issues onHadoop Clusters

Jiong Xie

Advisor: Dr. Xiao Qin

Committee Members:

Dr. Cheryl Seals

Dr. Dean Hendrix

University Reader:

Dr. Fa Foster Dai

04/11/23 1

Overview of My Research

04/11/23 2

Data Placementon Heterogeneous

Cluster[HCW 10]

Data movementData locality Data shuffling

Prefetching Data from Disk to Memory

[Submit to IPDPS]

Reduce network congest

[To Be Submitted]

Data-Intensive Applications

04/11/23 3

Data-Intensive Applications (cont.)

04/11/23 4

Background

• MapReduce programming model is growing in popularity

• Hadoop is used by Yahoo, Facebook, Amazon.

04/11/23 5

Hadoop Overview--Mapreduce Running System

04/11/23 6

(J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. OSDI ’04, pages 137–150)

Hadoop Distributed File System

04/11/23 7

(http://lucene.apache.org/hadoop)

Motivations

• MapReduce provides– Automatic parallelization & distribution– Fault tolerance– I/O scheduling– Monitoring & status updates

04/11/23 8

Existing Hadoop Clusters

• Observation 1: Cluster nodes are dedicated– Data locality issues– Data transfer time

• Observation 2: The number of nodes is increased Scalability issues Shuffling overhead goes up

04/11/23 9

Proposed Solutions

04/11/23 10

P3: Preshuffling

P1: Data placement

P2: Prefetching

InputInput

OutputOutput

MapMap

MapMap

MapMap

MapMap

MapMap

ReduceReduce

ReduceReduce

ReduceReduce

Solutions

04/11/23 11

P3: Preshuffling

P1: Data placement

P2: Prefetching

Offline, distributed data, heterogeneous node

Online, data preloading Intermediate data movement, reducing traffic

Improving MapReduce Performance through Data

Placement in Heterogeneous Hadoop Clusters

04/11/23 12

Motivational Example

04/11/23 1313

Time (min)

Node A(fast)

Node B(slow)

Node C(slowest)

2x slower

3x slower

1 task/min

The Native Strategy

04/11/23 14

Node A

Node B

Node C

3 tasks

2 tasks

6 tasks

Loading Transferring Processing

Time (min)

Our Solution--Reducing data transfer time

04/11/23 15

Node A’

Node B’

Node C’

3 tasks

2 tasks

6 tasks

Loading Transferring Processing

Time (min)

Node A

Challenges

04/11/23 16

• Does distribution strategy depend on applications?

• Initialization of data distribution

• The data skew problems– New data arrival– Data deletion – Data updating– New joining nodes

Measure Computing Ratios

04/11/23 17

• Computing ratio

• Fast machines process large data sets

Time

Node A

Node B

Node C

2x slower

3x slower

1 task/min

Measuring Computing Ratios

04/11/23 18

Node Response time(s)

Ratio # of File Fragments

Speed

Node A 10 1 6 Fastest

Node B 20 2 3 Average

Node C 30 3 2 Slowest

1. Run an application, collect response time

2. Set ratio of a node offering the shortest response time as 1

3. Normalize ratios of other nodes

4. Calculate the least common multiple of these ratios

5. Determine the amount of data processed by each node

Initialize Data Distribution

04/11/23 19

Namenode

Datanodes

112233

File1445566

778899

aabb

cc

• Input files split into 64MB blocks

• Round-Robin data distribution algorithm

CBA

Portions 3:2:1

Data Redistribution

04/11/23 2020

1

1.Get network topology, ratio, and utilization

2.Build and sort two lists:under-utilized node list L1

over-utilized node list L2

3. Select the source and destination node from the lists.

4.Transfer data

5.Repeat step 3, 4 until the list is empty.

Namenode

1122

33

4455

66778899

aabbcc

CA

CBA

B

234

L1

L2

Portion 3:2:1

Experimental Environment

04/11/23 21

Five nodes in a Hadoop heterogeneous cluster

Node CPU Model CPU(Hz) L1 Cache(KB)

Node A Intel core 2 Duo 2*1G=2G 204

Node B Intel Celeron 2.8G 256

Node C Intel Pentium 3 1.2G 256

Node D Intel Pentium 3 1.2G 256

Node E Intel Pentium 3 1.2G 256

Benckmarks

• Grep: a tool searching for a regular expression in a text file

• WordCount: a program used to count words in a text file

• Sort: a program used to list the inputs in sorted order.

04/11/23 22

Response Time of Grep andWordcount in Each Node

04/11/23 23

Application dependenceComputing ratio is

Data size independence

Computing Ratio for Two Applications

04/11/23 24

Computing ratio of the five nodes with respective of Grep and Wordcount applications

Computing Node Ratios for Grep Ratios for Wordcount

Node A 1 1

Node B 2 2

Node C 3.3 5

Node D 3.3 5

Node E 3.3 5

Six Data Placement Decisions

04/11/23 25

Impact of data placement on performance of Grep

04/11/23 26

Impact of data placement on performance of WordCount

04/11/23 27

Summary of Data Placement

P1: Data Placement Strategy• Motivation: Fast machines process large data sets• Problem: Data locality issue in heterogeneous

clusters• Contributions: Distribute data according to

computing capability– Measure computing ratio– Initialize data placement– Redistribution

04/11/23 28

Predictive Scheduling and Prefetching for Hadoop clusters

04/11/23 29

Prefetching

• Goal: Improving performance

• Approach– Best effort to guarantee data locality.– Keeping data close to computing nodes– Reducing the CPU stall time

04/11/23 30

Challenges

• What to prefetch?

• How to prefetch?

• What is the size of blocks to be prefetched?

04/11/23 31

Dataflow in Hadoop

04/11/23 32

mapmap

mapmap

reducereduce

reducereduce

HDFSHDFS

Block 1

Block 2

3.Read Input

1.Submit job

2.Schedule

Local FS

Local FS

Local FS

Local FS

4. Run map

5.he

artb

eat

6. N

ext t

ask

7.Read new file

Dataflow in Hadoop

04/11/23 33

mapmap

mapmap

reducereduce

reducereduce

HDFSHDFS

Block 1

Block 2

3.Read Input

1.Submit job

2.Schedule+ more task+ meta data

Local FS

Local FS

Local FS

Local FS

4. Run map

5.he

artb

eat

6. N

ext t

ask

5.1.Read new file

6. N

ext t

ask

4. Run map

Prefetching Processing

04/11/23 34

6

7

8

Software Architecture

04/11/23 35

Grep Performance

04/11/23 36

9.5% 1G8.5% 2G

WordCount Performance

04/11/23 37

8.9% 1G8.1% 2G

Large/Small file in a node

04/11/23 38

9.1% Grep8.3% WordCount

18% Grep24% WordCount

Experiment Setting

04/11/23 39

Large/Small file in cluster

04/11/23 40

Summary

P2: Predictive Scheduler and Prefetching• Goal: Moving data before task assigns• Problem: Synchronization task and data• Contributions: Preloading the required data early

than the task assigned– Predictive scheduler– Prefetching mechanism– Worker thread

04/11/23 41

Adaptive Preshuffling in Hadoop clusters

04/11/23 42

Preshuffling

• Observation 1: Too much data move from Map worker to Reduce worker– Solution1: Map nodes apply pre-shuffling

functions to their local output

• Observation 2: No reduce can start until a map is complete.– Solution2: Intermediate data is pipelined

between mappers and reducers. 04/11/23 43

Preshuffling

• Goal ： Minimize data shuffle during Reduce

• Approach– Pipeline– Overlap between map and data movement– Group map and reduce

• Challenges– Synchronize map and reduce– Data locality

04/11/23 44

Dataflow in Hadoop

04/11/23 45

mapmap

mapmap

reducereduce

reducereduce

HDFSHDFS

Block 1

Block 2

3.Read Input

1.Submit job 2.Schedule

Local FS

Local FS

Local FS

Local FS

5.he

artb

eat

6. N

ext t

ask

2. New task

HTTP GET

4. Run map3. Request data

HDFSHDFS

5.Write data

4. Send data

PreShuffle

04/11/23 46

Data request

mapmap

mapmap

reducereduce

reducereduce

In-memory buffer

04/11/23 47

Pipelining – A new design

04/11/23 48

HDFSHDFSHDFSHDFS

Block 1

Block 2

mapmap

mapmap

reducereduce

reducereduce


04/12/23 49

230 seconds vs 180 seconds


04/12/23 50

Sort Performace

04/12/23 51

Summary

P3: Preshuffling• Goal: Minimize data shuffling during the Reduce• Problem: task distribution and synchronization • Contributions: preshuffling agorithm

– Push data instead of tradition pull– In-memory buffer– Pipeline

04/12/23 52

Conclusion

04/12/23 53

InputInput

Output

Output

P3: Preshuffling

P1: Data placement

P2: Prefetching

Map

Map

Map

Map

Map

Map

Map

Map

Map

Map

Reduce

Reduce

Reduce

Reduce

Reduce

Reduce

Offline, distributed data, heterogeneous node

Online, data preloading, single node

Intermediate data movement, reducing traffic

Future Work

• Extend Pipelining– Implement the pipelining design

• Small files issue– Har file– Sequence file– CombineFileInputFormat

• Extend Data placement

04/12/23 54

Thanks!And Questions?

55

Run Time affected by Network Condition

04/12/23 56

Experiment result conducted by Yixian Yang

Traffic Volume affected by Network Condition

04/12/23 57

Experiment result conducted by Yixian Yang

performance issues on hadoop clusters

Technology

transfer data

distributed data

data placementp2

data placementoffline

data close

data redistribution4

data distribution portions

simplified data processing