hadoop on a personal supercomputer
DESCRIPTION
Hadoop Summit 2011 presentationTRANSCRIPT
PERVASIVE DATA INNOVATION
Hadoop on a Personal
Supercomputer
Paul Dingman – Chief Technologist, Integration Division
Pervasive and Hadoop
• Pervasive Software develops software products to manage, integrate
and analyze data.
• Innovation Lab projects around big data include:
– Hadoop
• Accelerate MapReduce (DataRush Community Edition)
• High-speed add-ons for HBase, Avro, Hive (TurboRush)
• Augment Sqoop
• Enhance ETL capabilities
– Benchmarks
• Terasort
• TPC-H
• SIEM/LogAnalytics EPS
• Genomics
2
Why are many-core systems interesting?
• Many-core processors make it possible to concentrate large amounts
of processing power in a single machine. Coupled with newer
storage technologies these systems can have high speed access to
tremendous amounts of storage.
• We have done a lot of work with multi-core systems at Pervasive
Software. Our Pervasive DataRush ™ Dataflow Engine takes
advantage of all available processor cores to efficiently process large
volumes of data.
– Analytics
– Data mining
– Genomics
• Potential cost and energy savings due to the need for fewer nodes.
• Potential performance gains by eliminating inter-node data exchange.
3
Pervasive DataRush™ Speed and Scalability
4
• World Record Performance set running Smith-Waterman algorithm
• Code written on an 8 core machine scaled to 384 cores with no changes!
Malstone-B10* Scalability
370.0
192.4
90.3
51.6
31.5
0
50
100
150
200
250
300
350
400
2 cores 4 cores 8 cores 16 cores 32 cores
Tim
e in
Min
ute
s
Core Count
Run-time
3.2 hours
with 4
cores
1.5 hours
with 8
cores Under 1
hour with
16 cores
5
* Cyber security benchmark from the Open Cloud Consortium
Run-time for 10B rows
How well does Hadoop work on many-core
systems?
• One of the areas we wanted to explore with Hadoop is to determine
how well it works on systems with lots of cores. In other words is it
possible to run Hadoop in an environment where you could exploit the
cores for complex operations, but still have the benefits of the
distributed environment provided by Hadoop and HDFS?
6
Master Node (NameNode/JobTracker)
Commodity Box
• 2 Intel Xeon L5310 CPUs 1.6
GHz (8 cores)
• 16 GB DRAM (ECC)
• 8 SATA Hard Disks (4 TB)
• Mellanox ConnectX-2 VPI
Dual Port Adapter Infiniband
7
Local DRAM (16 GB)
P1 P1
local
(8 spindles)
500 GB 500 GB…
Slave Nodes (DataNode/TaskTracker)
• 4 AMD Opteron 6172 CPUs
(48 cores)
• Supermicro MB
• 1 LSI 8 port HBA (6 GBps)
• 2 SATA SSDs (512 GB)
• 256 GB DRAM (ECC)
• 32 SATA Hard Disks (64 TB)
• Mellanox ConnectX-2 VPI
Dual Port Adapter Infiniband
8
Local DRAM (256 GB)
P1 P1 P1 P1
HDFS
(24 spindles, JBOD)
2TB 2TB
local
(8 spindles)
2TB…
Hadoop Cluster
9
Local DRAM
P1 P1 P1 P1
2TB 2TB 2TB…
Local DRAM
P1 P1 P1 P1
2TB 2TB 2TB…
Local DRAM
P1 P1
IPoIB
IPoIB IPoIB
Master
Slave Slave
• 104 cores (8/48/48)
• 128 TB storage (96 TB HDFS)
• 512 GB of memory
• 40 Gb Infiniband interconnects (IPoIB)
• CentOS 5.6
• Hadoop (Cloudera CDH3u0)
Hadoop Tuning
• We worked from the bottom up.
– Linux (various kernels and kernel settings)
– File systems (EXT2, EXT3, EXT4)
– Drivers (HBA)
– JVMs
• Initial tests were done using a single “fat” node (same config as
worker nodes).
• Made it easier to test different disk configurations.
• For Hadoop tests we primarily used 100 GB Terasort jobs for testing.
This test exercised all phases of the MapReduce process while not
being too large to run frequently.
10
Lessons Learned with Single Node Tuning
• We found we could comfortably run 40 maps and 20 reducers given
memory and CPU constraints
• Use large block size for HDFS.
– Execution time for map tasks was around 1 minute using 512 MB block size
• More spindles is better
– 1:1 ratio of map tasks to local HDFS spindles works well
– EXT2 seems to work well with JBOD
• Dedicated spindles for temporary files on each worker node
• Configure JVM settings for larger heap size to avoid spills
– Parallel GC seemed to help as well
• Compression of map outputs is a huge win (LZO)
• HBase scales well in fat nodes with DataRush (> 5M rows/sec bulk
load; >10M rows/sec sequential scan)
11
Varying Spindles for HDFS
12
0
100
200
300
400
500
600
700
800
900
8 16 24 32 40 48
Tim
e (
secs
)
HDFS Disks (2TB)
Terasort Average Execution Time
Terasort Average Execution Time
Varying Spindles for Intermediate Outputs
13
0
100
200
300
400
500
600
700
800
4 x 2TB 8 x 2TB 16 x 2TB Fusion I/O Drive RAID 0(4 x 2TB)
Tim
e (
secs
)
Drives for Intermediate Map Output
Terasort Average Execution Time
Terasort Average Execution Time
Flash
Single node 100 GB Terasort
0
10
20
30
40
50
60
70
0
16
32
48
64
80
96
11
2
12
8
14
4
16
0
17
6
19
2
20
8
22
4
24
0
25
6
27
2
28
8
30
4
32
0
33
6
35
2
36
8
38
4
40
0
41
6
43
2
44
8
46
4
48
0
49
6
51
2
52
8
54
4
56
0
57
6
59
2
60
8
Task
s
Execution Time (seconds)
reduce
merge
shuffle
maps
14
Clustering the Nodes
• We had a total of 64 hard disks for the cluster and had to split them
between the two nodes.
• Installed and configured Open Fabrics OFED to enable IPoIB.
• Reconfigure Hadoop to cluster the nodes.
15
Cluster 100 GB Terasort
0
20
40
60
80
100
120
11
12
13
14
15
16
17
18
19
11
01
11
11
21
13
11
41
15
11
61
17
11
81
19
12
01
21
12
21
23
12
41
25
12
61
27
12
81
29
13
01
31
13
21
33
13
41
35
13
61
37
13
81
39
1
Task
s
Execution Time (seconds)
reduce
merge
shuffle
maps
16
Comparisons with Amazon Clusters
• The Amazon clusters were used to get a better idea of what to expect
using more conventionally sized Hadoop nodes (non-EMR).
• We used „Cluster Compute Quadruple Extra Large‟ instances
– 23 GB of memory
– 33.5 EC2 Compute Units (Dual Intel Xeon X5570 quad-core “Nehalem” processors;
8 cores total)
– 1690 GB of instance storage (2 spindles)
– Very high I/O performance (10 GbE)
• Used a similar Hadoop configuration, but dialed back the number of
maps and reducers due to lower core count.
• Used cluster sizes that were roughly core count equivalent for
comparison
17
Per Node Comparison
Feature Amazon cc1.4xlarge Personal
Supercomputer
Cores 8 48
Memory 23 GB 256 GB
Memory/core 2.875 GB 5.333 GB
Spindles 2 32 (24 HDFS/8 temp)
Storage 1.690 TB 64 TB
Network 10 Gb Ethernet IPoIB (40 Gb Infiniband)
18
Performance Comparison
19
712
743
388
460
40
231
94
245 250
43
106
41
PSC Single node Amazon HPC Cluster(6 workers + master)
PSC Cluster(2 workers + master)
Amazon HPC Cluster (11 workers + master)
Execution Time (secs)
Run time cost (cents)
MB/dollar
48 cores 48 cores 96 cores 88 cores
Conclusions
• From what we have seen Hadoop works very well on many-core
systems. In fact, Hadoop runs quite well on even a single node
many-core system.
• Using denser nodes may make failures more expensive for some
system components. When using disk arrays the handling of hard
disk failures should be comparable to smaller nodes.
• The MapReduce framework treats all intermediate outputs as remote
resources. The copy phase of MapReduce doesn‟t benefit from
locality of data.
20
PERVASIVE DATA INNOVATION
Questions?
Follow up/more information:-
Visit our booth
Pervasive DataRush for Hadoop
www.pervasivedatarush.com/Technology/PervasiveDataRushforHadoop.
aspx
Presentation content – [email protected]