oracle solaris 11 as a big data platform apache hadoop use case

28
<Insert Picture Here> Oracle Solaris 11 as a Big Data Platform Apache Hadoop Use Case Orgad Kimchi, Principal Software Engineer Oracle ISV Engineering

Upload: orgad-kimchi

Post on 11-May-2015

2.540 views

Category:

Technology


5 download

DESCRIPTION

The following are benefits of using Oracle Solaris Zones for a Hadoop cluster: Fast provision of new cluster members using the zone cloning feature Very high network throughput between the zones for data node replication Optimized disk I/O utilization for better I/O performance with ZFS built-in compression Secure data at rest using ZFS encryption For more information see: http://www.oracle.com/technetwork/articles/servers-storage-admin/howto-setup-hadoop-zones-1899993.html

TRANSCRIPT

Page 1: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

<Insert Picture Here>

Oracle Solaris 11 as a Big Data Platform Apache Hadoop Use Case Orgad Kimchi, Principal Software Engineer

Oracle ISV Engineering

Page 2: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.2 Oracle Confidential, Proprietary Information

Disclaimer

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle Corporation.

Page 3: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.3 Oracle Confidential, Proprietary Information

Agenda

Hadoop Overview

The Benefits of Using Oracle Solaris Technologies for a

Hadoop Cluster

Page 4: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.4 Oracle Confidential, Proprietary Information

What is Big Data

Big Data is both: Large and Variable Datasets + New Set of Technologies

Extremely large files of unstructured or semi-structured data Large and highly distributed datasets that are otherwise difficult to manage

as a single unit of information That can economically acquire, organize, store, analyze and extract value

from Big Data datasets – thus facilitating better, more informed business decisions

Page 5: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.5 Oracle Confidential, Proprietary Information

Introduction To Hadoop

Page 6: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.6 Oracle Confidential, Proprietary Information

What is Hadoop ?

Originated at Google 2003 Generation of search indexes and web scores Top level Apache project, Consists of two key services

1. Hadoop Distributed File System (HDFS), highly scalable, fault-tolerant , distributed

2. MapReduce API (Java), Can be scripted in other languages

Hadoop brings the ability to cheaply process large amounts of data, regardless of its structure.

Page 7: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.7 Oracle Confidential, Proprietary Information

Components of Hadoop

Page 8: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.8 Oracle Confidential, Proprietary Information

HDFS

HDFS is the file system responsible for storing data on the cluster

Written in Java (based on Google’s GFS) Sits on top of a native file system (ext3, ext4, xfs, ZFS) POSIX like file permissions model Provides redundant storage for massive amounts of data HDFS is optimized for large, streaming reads of files

Page 9: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.9 Oracle Confidential, Proprietary Information

The Five Hadoop Daemons Hadoop is comprised of five separate daemons NameNode : Holds the metadata for HDFS Secondary NameNode : Performs housekeeping functions for the

NameNode DataNode : Stores actual HDFS data blocks JobTracker : Manages MapReduce jobs, distributes individual

tasks to machines running the TaskTracker. Coordinates MapReduce stages.

TaskTracker : Responsible for instantiating and monitoring individual Map and Reduce tasks

Page 10: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.10 Oracle Confidential, Proprietary Information

Hadoop Architecture

Page 11: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.11 Oracle Confidential, Proprietary Information

The benefits of using Oracle Solaris technologies for a Hadoop cluster

Page 12: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.12 Oracle Confidential, Proprietary Information

Solaris Zones Hadoop Architecture

Page 13: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.13 Oracle Confidential, Proprietary Information

Built-in VirtualizationOracle Solaris 11 Zones

• Secure, light-weight virtualization

• Scales to 100s of zones/ node

• Built-in, no cost virtualization

• Combines Isolation with Resource Management

• Widely used for:• Consolidation

• Legacy OS support

• Rapid Application Deployment

• Securely Protecting Applications

Co-engineered with installation, security, ZFS, networking, IPS, SPARC and x86 hypervisors

1 out of 3 Oracle Solaris Systems running Oracle Solaris Zones

Page 14: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.14 Oracle Confidential, Proprietary Information

Fast provision of new cluster members using the Solaris zones cloning feature

Very high network throughput between the zones for data node replication

Oracle Solaris Zones Benefits

The benefits of using Oracle Solaris Zones for a Hadoop cluster

Page 15: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.15 Oracle Confidential, Proprietary Information

Oracle Solaris Zones

Source http://dtrace.org/blogs/brendan/2013/01/11/virtualization-performance-zones-kvm-xen

Page 16: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.16 Oracle Confidential, Proprietary Information

Oracle Solaris Zones

Source http://dtrace.org/blogs/brendan/2013/01/11/virtualization-performance-zones-kvm-xen

Page 17: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.17 Oracle Confidential, Proprietary Information

Oracle Solaris 11: Storage VirtualizationSecure Datasets for Each Tenant

• Virtual flash-enabled storage pools for speed

• Built-in data services savestorage software costs

• File and block sharing

• Wire-speed encryptionon disk, over the wire

• Extreme data integrity

• Unlimited scale

10x storage savings for virtualization

2x storage compression

Finance Dataset

Finance

Zone

HRDataset

HR

Zone

SalesDataset

Sales

Zone

Page 18: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.18 Oracle Confidential, Proprietary Information

Immense data capacity,128 bit file system, perfect for big data-set

Optimized disk I/O utilization for better I/O performance with ZFS built-in compression

Secure data at rest using ZFS encryption

Oracle Solaris ZFS Benefits

The benefits of using Oracle Solaris ZFS for a Hadoop cluster

Page 19: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.19 Oracle Confidential, Proprietary Information

Each Oracle Solaris Zone can have different workload; it can be disk I/O, network I/O, CPU, memory, or combination of these. In addition, a single Oracle Solaris Zone can overload the entire system resources.

•Each Oracle Solaris Zone can have different workload; it can be disk I/O, network I/O, CPU, memory, or combination of these. In addition, a single Oracle Solaris Zone can overload the entire system resources.

DTrace - comprehensive, advanced tracing tool for troubleshooting systematic problems in real time.

Performance analysis

Page 20: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.20 Oracle Confidential, Proprietary Information

zonestatThe zonestat command allow us to monitor all the Solaris zones running on our environment and provide us in real time statistics for the CPU, memory and Network utilization.

root@global_zone:~# zonestat 10 10

Interval: 1, Duration: 0:00:10SUMMARY Cpus/Online: 128/12 PhysMem: 256G VirtMem: 259G ---CPU---- --PhysMem-- --VirtMem-- --PhysNet-- ZONE USED %PART USED %USED USED %USED PBYTE %PUSE [total] 118.10 92.2% 24.6G 9.62% 60.0G 23.0% 18.4E 100% [system] 0.00 0.00% 9684M 3.69% 40.5G 15.5% - - data-node3 42.13 32.9% 4897M 1.86% 6146M 2.30% 18.4E 100% data-node1 41.49 32.4% 4891M 1.86% 6173M 2.31% 18.4E 100% data-node2 33.97 26.5% 4851M 1.85% 6145M 2.30% 18.4E 100% global 0.34 0.27% 283M 0.10% 420M 0.15% 2192 0.00% name-node 0.15 0.11% 419M 0.15% 718M 0.26% 126 0.00% sec-name-node 0.00 0.00% 205M 0.07% 363M 0.13% 0 0.00%

Page 21: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.21 Oracle Confidential, Proprietary Information

DISK I/O Performance Monitoring

Page 22: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.22 Oracle Confidential, Proprietary Information

fsstat

The fsstat command allows us to monitor Disk I/O activity per Disk or per Solaris Zone.

For example: monitoring writes to all ZFS file systems at 10 second intervals.

root@global_zone:~# fsstat -Z zfs 10 10

new name name attr attr lookup rddir read read write write file remov chng get set ops ops ops bytes ops bytes 0 0 0 744 0 11.4K 0 6.01K 5.87M 0 0 zfs:global 0 0 0 151 0 3.27K 0 1.41K 1.94M 7 1.42K zfs:data-node1 0 0 0 359 0 8.72K 0 2.75K 3.95M 22 4.06K zfs:data-node2 0 0 0 413 0 9.03K 0 2.98K 4.22M 21 4.34K zfs:data-node3 0 0 0 14 0 51 0 0 0 0 0 zfs:name-node 0 0 0 14 0 51 0 0 0 0 0 zfs:sec-name-node

Page 23: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.23 Oracle Confidential, Proprietary Information

DISK I/O - Cont'd

Run the DTrace iopattern script, as shown, to analyze the type of disk I/O workload (is it random or sequential)

root@global_zone:~# /usr/dtrace/DTT/iopattern %RAN %SEQ COUNT MIN MAX AVG KR KW 69 31 236 1024 1048576 448830 103441 0 75 25 577 512 1048576 327938 184306 479 92 8 598 512 1048576 198293 114275 1525 74 26 379 512 1048576 330296 121954 294 66 34 281 1024 1048576 500550 137358 0 80 20 346 1024 1048576 332114 112218 0 81 19 444 512 1048576 290734 124694 1366 65 35 337 512 1048576 490375 161139 244 75 25 704 512 1048576 353086 241105 1642 75 25 444 1024 1048576 386634 167642 0 77 23 666 1024 1048576 397105 258274 0

Page 24: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.24 Oracle Confidential, Proprietary Information

Visualization

For more information about dim_STAT http://dimitrik.free.fr

Page 25: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.25 Oracle Confidential, Proprietary Information

Flame Graphs

For more information http://dtrace.org/blogs/brendan/2011/12/16/flame-graphs

Page 26: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.26 Oracle Confidential, Proprietary Information

Hadoop on an Oracle SPARC T4-2 Server

Source https://blogs.oracle.com/taylor22

Page 28: Oracle Solaris 11 as a BIG Data Platform Apache Hadoop Use Case

Copyright © 2013, Oracle and/or its affiliates. All rights reserved.28 Oracle Confidential, Proprietary Information

QuestionsFollow us on