hadoop network performance profile

36
Profiling the Network Performance of Hadoop Jobs Team : Pramod Biligiri & Sayed Asad Ali

Upload: pramodbiligiri

Post on 14-Dec-2014

57 views

Category:

Software


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Hadoop Network Performance profile

Profiling the Network Performance of Hadoop Jobs

Team : Pramod Biligiri & Sayed Asad Ali

Page 2: Hadoop Network Performance profile

Talk OutlineIntroduction to the problemWhat is Hadoop?Hadoop’s MapReduce FrameworkShuffle as a BottleneckExperimental SetupChoice of BenchmarksTerasort DiscussionRanked Inverted Index DiscussionSummary and Future Work

Page 3: Hadoop Network Performance profile

Introduction to the problem

Reproduce existing results which show that the “Network” is the bottleneck in shuffle-intensive

Hadoop jobs.

Page 4: Hadoop Network Performance profile

What is Hadoop?A framework for distributed processing of large data sets across clusters of computers using simple programming models based on Google’s MapReduce.

Distinct Features:● Designed for Commodity Hardware● Highly Fault-tolerant● Horizontally Scalable● Push computation to data

Page 5: Hadoop Network Performance profile

MapReduce● MapReduce is a programming model for processing large data sets

with a parallel, distributed algorithm on a cluster● Programming Model

○ For each input record, generate (key, value)○ Apply reduce operation for all values corresponding to the

same key

Page 6: Hadoop Network Performance profile

Hadoop’s MapReduce Framework1. Prepare the Map() input2. Run the user-provided Map() code3. "Shuffle" the Map output to the Reduce processors4. Run the user-provided Reduce() code5. Produce the final output

Page 7: Hadoop Network Performance profile

MapReduce Flow

Page 8: Hadoop Network Performance profile

Shuffle!

Page 9: Hadoop Network Performance profile

Shuffle as a Bottleneck?“On average, the shuffle phase accounts for 33% of the running

time in these jobs. In addition, in 26% of the jobs with reduce tasks, shuffles account for more than 50% of the running time, and in 16% of jobs, they account for more than 70% of the running time. This confirms widely reported results that the network is a bottleneck in MapReduce”

Managing Data Transfers in Computer Clusters with Orchestra- Mosharaf Chowdhury et al

Page 10: Hadoop Network Performance profile

Chosen Benchmarks

● Terasort● Ranked Inverted Index

Page 11: Hadoop Network Performance profile

Experimental SetupsInstance type Memory CPU Elastic Compute Units Disk Network performance

Config 1 m1.large 7.5 GB 64-bit 4 2 x 420 GB Moderate

Config 2 m1.xlarge 15 GB 64-bit 8 4 x 420 GB High

SDSC custom 8 GB 64-bit/ Intel Xeon CPU 5140 @2.33 GHz, 4 cores

2 x 1.5 TB 1 Gb/s

Page 12: Hadoop Network Performance profile

Network Performance of EMRConflicting Values!Source 1 : with AppNeta pathtest

average : 753 Mb/shttp://www.appneta.com/resources/pathtest-download.html

Source 2 : “The available bandwidth is still 1 Gb/s, confirming anecdotal evidence that EC2 has full bisection bandwidth."

Opening Up Black Box Networks with CloudTalk, by Costin Raiciu et al

Source 3 : “The median TCP/UDP throughput of mediuminstances are both close to 760 Mb/s."

The Impact of Virtualization on Network Performance of Amazon EC2 Data Center, by Guohui Wang et al

Page 13: Hadoop Network Performance profile

Why Terasort?● Popular benchmark for Hadoop● Shipped with most Hadoop distributions.● Utilizes all aspects of the cluster - cpu, network, disk and memory● Large amount of data to shuffle (240 GB).● Representative of real world workloads

“This data shuffle pattern arises in large scale sorts, merges and join operations in the data center. We chose this test because, in our interactions with application developers, we learned that many use such operations with caution, because the operations are highly expensive in today’s data center network.”

source : VL2: A Scalable and Flexible Data Center Network - A. Greenberg et al.

Page 14: Hadoop Network Performance profile

Terasort - How it works:● Sorts 1 terabyte of data.● Each data item is 100 bytes in size.● The first 10 bytes of a data item constitute its sort key.● Format of input data:

○ <key 10 bytes><rowid 10 bytes><filler 78 bytes>\r\n■ key : random characters from ASCII 32-126■ rowid : an integer■ filler : random characters from the set A-Z

Page 15: Hadoop Network Performance profile

Terasort - How it works:Map

Partition input keys into different buckets

<Leverage Hadoop’s default sorting of Map output>

ReduceCollect outputs from different maps

Page 16: Hadoop Network Performance profile

Results

Page 17: Hadoop Network Performance profile

Comparison of Terasort on different configurations

Instance type

Config 1 m1.large (RAM 7.5 GB)

Config 2 m1.xlarge (RAM 15 GB)

SDSC Custom (RAM 8 GB)

Total job Time (min)

Map Time (min)

Reduce Time (min)

Shuffle Average Time

Shuffle Time %

Config 1 205 84 205 60 29.3

SDSC 166 60 90 36 21.7

Config 2 86 40 75 22 25.5

Page 18: Hadoop Network Performance profile

CDF of data transferred over the network during the lifetime of the job

Map ends

Shuffle starts

Shuffle endsReduce nearly done

Sorting of Map outputs (local to the node)

5100 6900

Page 19: Hadoop Network Performance profile

Network Transfer Rate on nodesNetwork Link Saturated

Page 20: Hadoop Network Performance profile

Disk I/O

Sorting of map outputs

Blue : ReadRed : Write

Page 21: Hadoop Network Performance profile

CPU Utilisation

Page 22: Hadoop Network Performance profile

Memory Statistics

Page 23: Hadoop Network Performance profile

Why Ranked Inverted Index?● For a given text corpus, for each word it generates a list of

documents containing the word in decreasing order of frequencyword -> (count1 | file1), (count2 | file2), ...

count1 > count2 > …

● A ranked inverted index is used often in text processing and information retrieval tasks

● Mentioned in the Tarazu paper as a Shuffle heavy workloadTarazu: Optimizing MapReduce On Heterogeneous Clusters, Faraz Ahmad et al.

Page 24: Hadoop Network Performance profile

Ranked Inverted Index - How it works:Map input: (word | filename) -> count

Map output: word -> (filename, count)

Reduce output: word -> (count1 | file1), (count2 | file2) ...It involves a sort of the values on the reduce side

(Note that the Map input is the output of another MapReduce job called sequence-count)

Page 25: Hadoop Network Performance profile

Experimental Results of Ranked Inverted Index

Instance type

Config 1 m1.large (RAM 7.5 GB)

Total job Time (min)

Map Time (min)

Reduce Time (min)

Shuffle Average Time

Shuffle Time %

Config 1 12 5.5 11.5 3.5 27.14

Input Data Set : 40 GB ftp://ftp.ecn.purdue.edu/fahmad/rankedinvindex_40GB.tar.bz2

Page 26: Hadoop Network Performance profile

CDF of data transferred over the network during the lifetime of the job

Map ends

Shuffle starts

Shuffle endsReduce nearly done

Replicating results to 3 Nodes

Page 27: Hadoop Network Performance profile

Network Transfer Rate on nodes

Network Link Saturated

Page 28: Hadoop Network Performance profile

Disk I/O Blue : ReadRed : Write

Page 29: Hadoop Network Performance profile

CPU Utilisation

Page 30: Hadoop Network Performance profile

Memory Statistics

Page 31: Hadoop Network Performance profile

Summary- Shuffle can constitute significant time of the total job runtime- Worth investing in good network connectivity for a compute cluster

Page 32: Hadoop Network Performance profile

Stuff that doesn’t add up!● Why does peak Network Bandwidth for Ranked Inverted Index

overshoot the 1Gb/s mark?● Why is the sort phase of RII so short?

Page 33: Hadoop Network Performance profile

Future Work● How does changing the various parameters make a difference? eg

io.sort.mb, io.sort.factor, fs.inmemory.size.mb● Effect of Combiners?● Varying the number of Map Tasks and Reduce Tasks● How many Map tasks are rack local or machine local?● Investigate the unresolved issues● Lack of precise information about “topology” and “network

bandwidth” for EMR Clusters

Page 34: Hadoop Network Performance profile

Q n A

Page 35: Hadoop Network Performance profile

Thank you!

Page 36: Hadoop Network Performance profile

Standard Test ResultsInput Size Run Time on

Hadoop (min)Shuffle Volume Critical Path

tera-sort 300 2353 200 Shuffle

ranked-inverted-index 205 2322 219 Shuffle