hj -hadoop an optimized mapreduce runtime for multi-core systems

1

HJ-HadoopAn Optimized MapReduce

Runtime for Multi-core Systems

Yunming ZhangAdvised by: Prof. Alan Cox and Vivek SarkarRice University

Social Informatics

Slide borrowed from Prof. Geoffrey Fox’s presentation

Big Data Era

20 PB/day

100 PB media

120 PB cluster

Bing ~ 300 PB

2008

2012

2011

2012

Slide borrowed from Florin Dinu’s presentation

4

MapReduce Runtime

Job Starts

Map

…….

Reduce

JobEnds

Figure 1. Map Reduce Programming Model

Map

MapReduce

5

Hadoop Map Reduce

• Open source implementation of Map Reduce Runtime system– Scalable– Reliable– Available

• Popular platform for big data analytics

6

Habanero Java(HJ)

• Programming Language and Runtime Developed at Rice University

• Optimized for multi-core systems– Lightweight async task– Work sharing runtime– Dynamic task parallelism– http://habanero.rice.edu

7

Kmeans

8

Kmeans

Topics

To be Classified Documents

Kmeans is an application that takes as input a large number of documents and try to classify them into different topics

Kmeans using Hadoop

9

To be classified documents Computation Memory

Machine 1Map task in a JVM

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics

Machines …

Kmeans using Hadoop

10



slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics

Topics

Machines …

Kmeans using Hadoop

11



slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics

Topics

Topics

Machines …

Kmeans using Hadoop

12



slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics

Topics

Topics

Topics

Machines …

Kmeans using Hadoop

13



Duplicated In-memory Cluster Centroids

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics 1x

Topics 1x

Topics 1x

Topics 1x

Machines …

14

Memory Wall

0 50 100 150 200 250 300 350 4000

20

40

60

80

100

120

140

160

180

200

KMeans Throughput Benchmark

Hadoop

Topics data size (MB) with 4KB/topic

Num

ber o

f top

ics/

Tim

e in

min

We used 8 mappers from 30 -80 MB, 4 mappers for 100 – 150 MB, 2 mappers for 180 – 380 for sequential Hadoop.

cluster size(MB)

HadoopFull GC calls

30 3,54250 4,39070 5,18680 1,108,888

15

Memory Wall

• Hadoop’s approach to the problem– Increase the memory available to each Map Task

JVM by reducing the number of map tasks assigned to each machine.

Kmeans using Hadoop

16



slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

…. Machines …

Topics 2x

Topics 2x

17

Memory Wall

0 50 100 150 200 250 300 350 4000

20

40

60

80

100

120

140

160

180

200


Hadoop


Num

ber o

f top

ics/

Tim

e in

min

Decreased throughput due to reduced number of map tasks per machine

We used 8 mappers from 30 -80 MB, 4 mappers for 100 – 150 MB, 2 mappers for 180 – 380 for sequential Hadoop.

18

HJ-Hadoop Approach 1

To be classified documents

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Computation Memory

Machine1Map task in a JVM

Topics 4x

Machines …

Dynamic chunking

19



slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Computation Memory


Topics 4x

Machines …

Dynamic chunking

NoDuplicated In-memory Cluster Centroids

20

Results

0 50 100 150 200 250 300 350 4000

20

40

60

80

100

120

140

160

180

200


HJ-HadoopHadoop


Num

ber o

f top

ics/

Tim

e in

min

We used 2 mappers for HJ-Hadoop

21

Results

0 50 100 150 200 250 300 350 4000

20

40

60

80

100

120

140

160

180

200


HJ-HadoopHadoop


Num

ber o

f top

ics/

Tim

e in

min


Process 5x Topics efficiently

22

Results

0 50 100 150 200 250 300 350 4000

20

40

60

80

100

120

140

160

180

200


HJ-HadoopHadoop


Num

ber o

f top

ics/

Tim

e in

min


4x throughput improvement

23



slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Computation Memory


Topics 4x

Machines …

Dynamic chunking

24

HJ-Hadoop Approach 1Only a single thread reading input

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Computation Memory


Topics 4x

Machines …

Dynamic chunking

Kmeans using Hadoop

25

Computation Memory


slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics

Topics

Topics

Topics

Machines …

Four threads reading input


26

Computation Memory


slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics

Machines …



27

Computation Memory


slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics

Machines …


NoDuplicated In-memory Cluster Centroids

28

Trade Offs between the two approaches

• Approach 1– Minimum memory overhead– Improved CPU utilization with small task granularity

• Approach 2– Improved IO performance– Overlap between IO and Computation

• Hybrid Approach– Improved IO with small memory overhead– Improved CPU utilization

29

Conclusions

• Our goal is to tackle the memory inefficiency in the execution of MapReduce applications on multi-core systems by integrating a shared memory parallel model into Hadoop MapReduce runtime– HJ-Hadoop can be used to solve larger problems

efficiently than Hadoop, processing process 5x more data at full throughput of the system

– The HJ-Hadoop can deliver a 4x throughput relative to Hadoop processing very large in-memory data sets

hj -hadoop an optimized mapreduce runtime for multi-core systems

Documents

map phase

hadoop mapreduce

thousands of computers

terabytes of data

big data analytics application

work hjhadoop

large number of map

work focus