hadoop on a personal supercomputer

PERVASIVE DATA INNOVATION

Hadoop on a Personal

Supercomputer

Paul Dingman – Chief Technologist, Integration Division

[email protected]

Pervasive and Hadoop

• Pervasive Software develops software products to manage, integrate

and analyze data.

• Innovation Lab projects around big data include:

– Hadoop

• Accelerate MapReduce (DataRush Community Edition)

• High-speed add-ons for HBase, Avro, Hive (TurboRush)

• Augment Sqoop

• Enhance ETL capabilities

– Benchmarks

• Terasort

• TPC-H

• SIEM/LogAnalytics EPS

• Genomics

2

Why are many-core systems interesting?

• Many-core processors make it possible to concentrate large amounts

of processing power in a single machine. Coupled with newer

storage technologies these systems can have high speed access to

tremendous amounts of storage.

• We have done a lot of work with multi-core systems at Pervasive

Software. Our Pervasive DataRush ™ Dataflow Engine takes

advantage of all available processor cores to efficiently process large

volumes of data.

– Analytics

– Data mining

– Genomics

• Potential cost and energy savings due to the need for fewer nodes.

• Potential performance gains by eliminating inter-node data exchange.

3

Pervasive DataRush™ Speed and Scalability

4

• World Record Performance set running Smith-Waterman algorithm

• Code written on an 8 core machine scaled to 384 cores with no changes!

Malstone-B10* Scalability

370.0

192.4

90.3

51.6

31.5

0

50

100

150

200

250

300

350

400

2 cores 4 cores 8 cores 16 cores 32 cores

Tim

e in

Min

ute

s

Core Count

Run-time

3.2 hours

with 4

cores

1.5 hours

with 8

cores Under 1

hour with

16 cores

5

* Cyber security benchmark from the Open Cloud Consortium

Run-time for 10B rows

How well does Hadoop work on many-core

systems?

• One of the areas we wanted to explore with Hadoop is to determine

how well it works on systems with lots of cores. In other words is it

possible to run Hadoop in an environment where you could exploit the

cores for complex operations, but still have the benefits of the

distributed environment provided by Hadoop and HDFS?

6

Master Node (NameNode/JobTracker)

Commodity Box

• 2 Intel Xeon L5310 CPUs 1.6

GHz (8 cores)

• 16 GB DRAM (ECC)

• 8 SATA Hard Disks (4 TB)

• Mellanox ConnectX-2 VPI

Dual Port Adapter Infiniband

7

Local DRAM (16 GB)

P1 P1

local

(8 spindles)

500 GB 500 GB…

Slave Nodes (DataNode/TaskTracker)

• 4 AMD Opteron 6172 CPUs

(48 cores)

• Supermicro MB

• 1 LSI 8 port HBA (6 GBps)

• 2 SATA SSDs (512 GB)

• 256 GB DRAM (ECC)

• 32 SATA Hard Disks (64 TB)

• Mellanox ConnectX-2 VPI

Dual Port Adapter Infiniband

8

Local DRAM (256 GB)

P1 P1 P1 P1

HDFS

(24 spindles, JBOD)

2TB 2TB

local

(8 spindles)

2TB…

Hadoop Cluster

9

Local DRAM

P1 P1 P1 P1

2TB 2TB 2TB…

Local DRAM

P1 P1 P1 P1

2TB 2TB 2TB…

Local DRAM

P1 P1

IPoIB

IPoIB IPoIB

Master

Slave Slave

• 104 cores (8/48/48)

• 128 TB storage (96 TB HDFS)

• 512 GB of memory

• 40 Gb Infiniband interconnects (IPoIB)

• CentOS 5.6

• Hadoop (Cloudera CDH3u0)

Hadoop Tuning

• We worked from the bottom up.

– Linux (various kernels and kernel settings)

– File systems (EXT2, EXT3, EXT4)

– Drivers (HBA)

– JVMs

• Initial tests were done using a single “fat” node (same config as

worker nodes).

• Made it easier to test different disk configurations.

• For Hadoop tests we primarily used 100 GB Terasort jobs for testing.

This test exercised all phases of the MapReduce process while not

being too large to run frequently.

10

Lessons Learned with Single Node Tuning

• We found we could comfortably run 40 maps and 20 reducers given

memory and CPU constraints

• Use large block size for HDFS.

– Execution time for map tasks was around 1 minute using 512 MB block size

• More spindles is better

– 1:1 ratio of map tasks to local HDFS spindles works well

– EXT2 seems to work well with JBOD

• Dedicated spindles for temporary files on each worker node

• Configure JVM settings for larger heap size to avoid spills

– Parallel GC seemed to help as well

• Compression of map outputs is a huge win (LZO)

• HBase scales well in fat nodes with DataRush (> 5M rows/sec bulk

load; >10M rows/sec sequential scan)

11

Varying Spindles for HDFS

12

0

100

200

300

400

500

600

700

800

900

8 16 24 32 40 48

Tim

e (

secs

)

HDFS Disks (2TB)

Terasort Average Execution Time


Varying Spindles for Intermediate Outputs

13

0

100

200

300

400

500

600

700

800

4 x 2TB 8 x 2TB 16 x 2TB Fusion I/O Drive RAID 0(4 x 2TB)

Tim

e (

secs

)

Drives for Intermediate Map Output



Flash

Single node 100 GB Terasort

0

10

20

30

40

50

60

70

0

16

32

48

64

80

96

11

2

12

8

14

4

16

0

17

6

19

2

20

8

22

4

24

0

25

6

27

2

28

8

30

4

32

0

33

6

35

2

36

8

38

4

40

0

41

6

43

2

44

8

46

4

48

0

49

6

51

2

52

8

54

4

56

0

57

6

59

2

60

8

Task

s

Execution Time (seconds)

reduce

merge

shuffle

maps

14

Clustering the Nodes

• We had a total of 64 hard disks for the cluster and had to split them

between the two nodes.

• Installed and configured Open Fabrics OFED to enable IPoIB.

• Reconfigure Hadoop to cluster the nodes.

15

Cluster 100 GB Terasort

0

20

40

60

80

100

120

11

12

13

14

15

16

17

18

19

11

01

11

11

21

13

11

41

15

11

61

17

11

81

19

12

01

21

12

21

23

12

41

25

12

61

27

12

81

29

13

01

31

13

21

33

13

41

35

13

61

37

13

81

39

1

Task

s

Execution Time (seconds)

reduce

merge

shuffle

maps

16

Comparisons with Amazon Clusters

• The Amazon clusters were used to get a better idea of what to expect

using more conventionally sized Hadoop nodes (non-EMR).

• We used „Cluster Compute Quadruple Extra Large‟ instances

– 23 GB of memory

– 33.5 EC2 Compute Units (Dual Intel Xeon X5570 quad-core “Nehalem” processors;

8 cores total)

– 1690 GB of instance storage (2 spindles)

– Very high I/O performance (10 GbE)

• Used a similar Hadoop configuration, but dialed back the number of

maps and reducers due to lower core count.

• Used cluster sizes that were roughly core count equivalent for

comparison

17

Per Node Comparison

Feature Amazon cc1.4xlarge Personal

Supercomputer

Cores 8 48

Memory 23 GB 256 GB

Memory/core 2.875 GB 5.333 GB

Spindles 2 32 (24 HDFS/8 temp)

Storage 1.690 TB 64 TB

Network 10 Gb Ethernet IPoIB (40 Gb Infiniband)

18

Performance Comparison

19

712

743

388

460

40

231

94

245 250

43

106

41

PSC Single node Amazon HPC Cluster(6 workers + master)

PSC Cluster(2 workers + master)

Amazon HPC Cluster (11 workers + master)

Execution Time (secs)

Run time cost (cents)

MB/dollar

48 cores 48 cores 96 cores 88 cores

Conclusions

• From what we have seen Hadoop works very well on many-core

systems. In fact, Hadoop runs quite well on even a single node

many-core system.

• Using denser nodes may make failures more expensive for some

system components. When using disk arrays the handling of hard

disk failures should be comparable to smaller nodes.

• The MapReduce framework treats all intermediate outputs as remote

resources. The copy phase of MapReduce doesn‟t benefit from

locality of data.

20

PERVASIVE DATA INNOVATION

Questions?

Follow up/more information:-

Visit our booth

Pervasive DataRush for Hadoop

www.pervasivedatarush.com/Technology/PervasiveDataRushforHadoop.

aspx

Presentation content – [email protected]

http://www.pervasivedatarush.com/Technology/PervasiveDataRushforHadoop.aspx

http://www.pervasivedatarush.com/Technology/PervasiveDataRushforHadoop.aspx

hadoop on a personal supercomputer

Technology

gb local

hadoop work

hadoop tests

gb dram ecc

tb hdfs

hadoop pervasive software

gb of memory

gb terasort jobs