hj -hadoop an optimized mapreduce runtime for multi-core systems

29
HJ-Hadoop An Optimized MapReduce Runtime for Multi-core Systems Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University 1

Upload: hetal

Post on 22-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems. Yunming Zhang Advised by: Prof. Alan Cox and Vivek Sarkar Rice University. Social Informatics. Big Data Era. 2008. 20 PB/day. 100 PB media. 2012. 120 PB cluster. 2011. Bing ~ 300 PB. 2012. MapReduce Runtime. Map. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

1

HJ-HadoopAn Optimized MapReduce

Runtime for Multi-core Systems

Yunming ZhangAdvised by: Prof. Alan Cox and Vivek SarkarRice University

Page 2: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

Social Informatics

Slide borrowed from Prof. Geoffrey Fox’s presentation

Page 3: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

Big Data Era

20 PB/day

100 PB media

120 PB cluster

Bing ~ 300 PB

2008

2012

2011

2012

Slide borrowed from Florin Dinu’s presentation

Page 4: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

4

MapReduce Runtime

Job Starts

Map

…….

Reduce

JobEnds

Figure 1. Map Reduce Programming Model

Map

MapReduce

Page 5: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

5

Hadoop Map Reduce

• Open source implementation of Map Reduce Runtime system– Scalable– Reliable– Available

• Popular platform for big data analytics

Page 6: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

6

Habanero Java(HJ)

• Programming Language and Runtime Developed at Rice University

• Optimized for multi-core systems– Lightweight async task– Work sharing runtime– Dynamic task parallelism– http://habanero.rice.edu

Page 7: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

7

Kmeans

Page 8: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

8

Kmeans

Topics

To be Classified Documents

Kmeans is an application that takes as input a large number of documents and try to classify them into different topics

Page 9: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

Kmeans using Hadoop

9

To be classified documents Computation Memory

Machine 1Map task in a JVM

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics

Machines …

Page 10: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

Kmeans using Hadoop

10

To be classified documents Computation Memory

Machine 1Map task in a JVM

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics

Topics

Machines …

Page 11: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

Kmeans using Hadoop

11

To be classified documents Computation Memory

Machine 1Map task in a JVM

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics

Topics

Topics

Machines …

Page 12: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

Kmeans using Hadoop

12

To be classified documents Computation Memory

Machine 1Map task in a JVM

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics

Topics

Topics

Topics

Machines …

Page 13: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

Kmeans using Hadoop

13

To be classified documents Computation Memory

Machine 1Map task in a JVM

Duplicated In-memory Cluster Centroids

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics 1x

Topics 1x

Topics 1x

Topics 1x

Machines …

Page 14: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

14

Memory Wall

0 50 100 150 200 250 300 350 4000

20

40

60

80

100

120

140

160

180

200

KMeans Throughput Benchmark

Hadoop

Topics data size (MB) with 4KB/topic

Num

ber o

f top

ics/

Tim

e in

min

We used 8 mappers from 30 -80 MB, 4 mappers for 100 – 150 MB, 2 mappers for 180 – 380 for sequential Hadoop.

cluster size(MB)

HadoopFull GC calls

30 3,54250 4,39070 5,18680 1,108,888

Page 15: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

15

Memory Wall

• Hadoop’s approach to the problem– Increase the memory available to each Map Task

JVM by reducing the number of map tasks assigned to each machine.

Page 16: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

Kmeans using Hadoop

16

To be classified documents Computation Memory

Machine 1Map task in a JVM

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

…. Machines …

Topics 2x

Topics 2x

Page 17: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

17

Memory Wall

0 50 100 150 200 250 300 350 4000

20

40

60

80

100

120

140

160

180

200

KMeans Throughput Benchmark

Hadoop

Topics data size (MB) with 4KB/topic

Num

ber o

f top

ics/

Tim

e in

min

Decreased throughput due to reduced number of map tasks per machine

We used 8 mappers from 30 -80 MB, 4 mappers for 100 – 150 MB, 2 mappers for 180 – 380 for sequential Hadoop.

Page 18: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

18

HJ-Hadoop Approach 1

To be classified documents

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Computation Memory

Machine1Map task in a JVM

Topics 4x

Machines …

Dynamic chunking

Page 19: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

19

HJ-Hadoop Approach 1

To be classified documents

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Computation Memory

Machine1Map task in a JVM

Topics 4x

Machines …

Dynamic chunking

NoDuplicated In-memory Cluster Centroids

Page 20: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

20

Results

0 50 100 150 200 250 300 350 4000

20

40

60

80

100

120

140

160

180

200

KMeans Throughput Benchmark

HJ-HadoopHadoop

Topics data size (MB) with 4KB/topic

Num

ber o

f top

ics/

Tim

e in

min

We used 2 mappers for HJ-Hadoop

Page 21: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

21

Results

0 50 100 150 200 250 300 350 4000

20

40

60

80

100

120

140

160

180

200

KMeans Throughput Benchmark

HJ-HadoopHadoop

Topics data size (MB) with 4KB/topic

Num

ber o

f top

ics/

Tim

e in

min

We used 2 mappers for HJ-Hadoop

Process 5x Topics efficiently

Page 22: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

22

Results

0 50 100 150 200 250 300 350 4000

20

40

60

80

100

120

140

160

180

200

KMeans Throughput Benchmark

HJ-HadoopHadoop

Topics data size (MB) with 4KB/topic

Num

ber o

f top

ics/

Tim

e in

min

We used 2 mappers for HJ-Hadoop

4x throughput improvement

Page 23: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

23

HJ-Hadoop Approach 1

To be classified documents

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Computation Memory

Machine1Map task in a JVM

Topics 4x

Machines …

Dynamic chunking

Page 24: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

24

HJ-Hadoop Approach 1Only a single thread reading input

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Computation Memory

Machine1Map task in a JVM

Topics 4x

Machines …

Dynamic chunking

Page 25: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

Kmeans using Hadoop

25

Computation Memory

Machine 1Map task in a JVM

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics

Topics

Topics

Topics

Machines …

Four threads reading input

Page 26: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

HJ-Hadoop Approach 2

26

Computation Memory

Machine 1Map task in a JVM

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics

Machines …

Four threads reading input

Page 27: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

HJ-Hadoop Approach 2

27

Computation Memory

Machine 1Map task in a JVM

slice1

slice2

slice3

slice4

slice5

slice6

slice7

slice8

….

Topics

Machines …

Four threads reading input

NoDuplicated In-memory Cluster Centroids

Page 28: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

28

Trade Offs between the two approaches

• Approach 1– Minimum memory overhead– Improved CPU utilization with small task granularity

• Approach 2– Improved IO performance– Overlap between IO and Computation

• Hybrid Approach– Improved IO with small memory overhead– Improved CPU utilization

Page 29: HJ -Hadoop An Optimized MapReduce Runtime for Multi-core Systems

29

Conclusions

• Our goal is to tackle the memory inefficiency in the execution of MapReduce applications on multi-core systems by integrating a shared memory parallel model into Hadoop MapReduce runtime– HJ-Hadoop can be used to solve larger problems

efficiently than Hadoop, processing process 5x more data at full throughput of the system

– The HJ-Hadoop can deliver a 4x throughput relative to Hadoop processing very large in-memory data sets