apache ambari bof - openstack - hadoop summit 2013

27
© Hortonworks Inc. 2013 Hadoop + OpenStack integration Roadmap Himanshu Bari June 28 th , 2013 Sr. Product Manager [email protected]

Upload: hortonworks

Post on 26-Jan-2015

111 views

Category:

Technology


1 download

DESCRIPTION

Apache Ambari BOF Meet Up @ Hadoop Summit 2013 OpenStack http://www.meetup.com/Apache-Ambari-User-Group/events/119184782/

TRANSCRIPT

Page 1: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Hadoop + OpenStack integration Roadmap

Himanshu Bari

June 28th, 2013

Sr. Product Manager [email protected]

Page 2: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Disclaimer •  This document may contain product features and technology directions

that are under development or may be under development in the future.

•  Technical feasibility, market demand, user feedback, and the Apache Software Foundation community development process can all affect timing and final delivery.

•  This document’s description of these features and technology directions does not represent a contractual commitment from Hortonworks to deliver these features in any generally available product.

•  Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

Page 3: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Agenda

Why Hadoop on OpenStack

Use cases A bit under the hood

Page 4: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Big Data & Cloud Intersection

Point è2013

Big Data & Cloud are top priority for CIOs

Page 4

*

Page 5: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

OpenStack is an open source cloud management platform

Glance Image Service

Keystone Identity Service

Horizon

Quantum Nova Cinder

Block Store

Swift Object Store

(Apache License)

Ceilometer Metering

Heat Orchestration

Integrated

Mutli-hypervisor & guest OS support

Page 6: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

OpenStack has taken over Amazon AWS in market awareness…

Source: Google trends

Page 7: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Maturing quickly with broad support.. Pushed  by    

150+  vendors      Millions  of  dollars  in  

venture  capital  Early  adop;on  across  all  

ver;cals  

Page 8: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Why Hadoop & OpenStack? Hadoop provides a greenfield use case •  Net new workload •  Needs scale out

infrastructure •  Shared platform

OpenStack provides the perfect cloud platform •  Operational agility •  Supports scale out architecture •  Deployment choice across

public & private clouds

1.  Open source communities provide the fastest path to innovation 2.  Open source is changing the game as economics and accessibility serve to

accelerate cloud & big data market trends 3.  Both are attracting major ecosystem players: IBM, RHT, HP, RAX, etc…

Marries two of the largest open source movements

Page 9: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Accelerate Adoption of Hadoop on OpenStack

Page 9

The leading contributor to Apache Hadoop

The leading system integrator for OpenStack

The leading contributor to OpenStack

Apache Hadoop… The killer app for OpenStack

Page 10: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

OpenStack Infrastructure

Savanna Elastic Hadoop Controller

Collaborating on Project Savanna

Page 10

Swift storage

Hadoop Cluster

NN

NN

NN

2

Ambari Hadoop management

- - + +

NN

NN

1

3

1.  Cluster templates: deploy pre configured Hadoop clusters in seconds from Horizon or Ambari

2.  HDFS-Swift connectors:

move data between HDFS and Swift object storage

3.  Simplified elasticity

Project Savanna Automate deployment of Apache Hadoop on OpenStack

Page 11: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Agenda

Why Hadoop on OpenStack

Use cases A bit under the hood

Page 12: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Focus on API driven tight integration

Hide Hadoop complexity through APIs “It Just Works” experience

Fully leverage virtualization Scalability, Reliability, Performance

Project Savanna design Goals

Page 13: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Problems driving use cases

Finance Compliance

IT Marketing

Web Mobile

Sensor

Interactive

Batch

Dev QA Prod

Operational nightmare of supporting multiple cluster flavors

Lack of agility Underutilized resources

Maintenance complications

Cluster requirements vary by business unit, data type & analytics use case

Can’t migrate from public to private cloud

Page 14: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Provisioning related use cases

-  Frequent dev/test/staging cluster provision requests -  Migrations from staging to prod and vice versa -  Reduce operator error in cluster provisioning -  Migrate away from Amazon EMR for Ad hoc analytics

requests to support experimentation

Page 15: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Simplified provisioning P

hase

-1

Pha

se-2

Use as is Single click provisioning

Modify

Update VM resource allocation, service to VM mapping and service config

Provision and/or save

template

Template based provisioning

Hadoop as a service (job flow based provisioning)

Pick  job  type  

+  Cascading,  streaming  &    custom  jar  

Upload data to Swift

Get results in Swift

Cluster  template  E.g.  QA  cluster  

Node  template    a.  Resource  based          -­‐  node.Large  b.  Func;on  based          -­‐  node.NameNode    

Modify

Page 16: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Ambari embedded in Horizon

Page 17: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Swift object store support

Phase-1

Phase-2 Bug fixes & optimizations

Read/write data from/to Swift object stores Option-1: Copy data from Swift to HDFS, run mapreduce and copy results back to swift Option-2: Run mapreduce directly on top of Swift (Output data still needs to be copied from HDFS to Swift)

Page 18: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Elasticity related use cases

-  Commission a new node or decommission a node for maintenance

-  For dev/test/staging clusters: automatically vary

cluster data & compute capacity based on tenant, workload, time of day, resource utilization etc.

-  Automatically vary compute capacity for production

clusters

Page 19: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Elasticity N

ode

elas

ticity

(c

ompu

te a

nd/o

r dat

a)

Manual

Rule based

Long lived Short lived

Cluster life (Swift or HDFS used for storage)

Phase-1

Phase-2

Handle variable workloads eg. Alter cluster compute node count for peak/off-peak hrs.

Job flow based clusters for ad-hoc analysis

Best for Dev/QA use

Best for predictable workloads.

Page 20: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Multi-tenancy related use cases

-  Improve server utilization by creating a common server pool for Hadoop and non Hadoop workloads

-  Simplify maintenance & upgrade testing with the

ability to multiple Hadoop clusters with different versions on the same server pool

-  Support varying SLAs based on tenant and workload

through resource isolation provided by VMs -  Simplify chargeback/showback

Page 21: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Multi-tenancy

Phase-1

Phase-2

•  Access isolation •  Single sign-on for Ambari & HUE through Keystone

integration •  Dedicated Ambari & HUE instance per cluster per

tenant •  Resource isolation

•  CPU, memory isolation through VMs •  Ability to pin a Hadoop VM to a given set of physical

hosts to enable per tenant physical host isolation •  Version isolation

•  Choice of Hadoop versions for tenants

•  Access isolation •  Single Ambari instance per tenant ( multi-cluster

support with Ambari) •  Keystone enhancements to support Hadoop job flow

level RBAC to support Hadoop as a service

Page 22: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Agenda

Why Hadoop on OpenStack

Use cases A bit

under the hood

Page 23: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Savanna logical architecture

OpenStack Infrastructure

Network Storage

Security Compute

Savanna Controller

HDP Savanna plugin

API

Hadoop Provisioning

Ambari template management

Horizon + Savanna UI

A P I

Configuration Elasticity

Orchestration

Plugin manager

Hadoop Cluster

Ambari + API

Page 24: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Provisioning workflow overview

24

Horizon  

Savanna Controller

+ HDP OpenStack

Plugin

Nova   Glance  Cluster request

Provisions vanilla VMs

Ambari configures all services and

starts the cluster

VM IMAGE OS only

OR Pre loaded

with HDP bits

HDP plugin passes cluster

template to Ambari

Hadoop Cluster

……

HDP Plugin installs Ambari

Ambari Server

HUE

NN

JT

DN

DN

Page 25: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Ambari based cluster templates

Preconfigured information across all clusters using this template

HDP Stack Information

- Services & Components & Packages - Description - Package Dependencies

Hadoop Topology

Component / Host Group Mapping

Hadoop Configuration All Hadoop Configuration for the Cluster (hundreds of parameters and their values)

Per cluster pluggable data

- User names - Passwords - Host names - Host VM flavors ( CPU/Mem) - Node count per host group ………. ………. ………. ……….

Page 26: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Swift object store support (Hadoop-8545)

Dir

File1 file2 file3

KE

YS

TON

E  

Dir/file1   Dir/file2  

MapReduce, pig & Hive

Swift store-1

Create, read, write, delete, mkdir, ls, mv & stat

HDFS +

Swift Bridge

Container -1 Container -2

Swift store-n

Dir/file3  

Container -1

Input data

Output results

Page 27: Apache Ambari BOF - OpenStack - Hadoop Summit 2013

© Hortonworks Inc. 2013

Hadoop virtualization extensions(HVE)

• Account for the additional ‘node group’ layer so replicas do not end up on VMs in the same hypervisor

• Available in HDP 1.3. Work in progress to enable in HDP 2.0 ( YARN & HDFS)

Data Center

Rack-1

Node group-1

VM1 VM2

Node group-2

VM1 VM2

Rack-2

Node group-1

VM1 VM2

Node group-2

VM1 VM2

-  Replica (place, choose & remove) policies

-  Balancer policies -  Task placement &

container allocation(YARN)