hadoop on cloud. why and how?

44
1 © Cloudera, Inc. All rights reserved. Hadoop on Cloud. Why and How? Andrei Savu | Tech Lead, Cloudera Director Silicon Valley Cloud Computing Group | Nov 18, 2015

Upload: andrei-savu

Post on 19-Jan-2017

906 views

Category:

Technology


0 download

TRANSCRIPT

1© Cloudera, Inc. All rights reserved.

Hadoop on Cloud. Why and How?Andrei Savu | Tech Lead, Cloudera DirectorSilicon Valley Cloud Computing Group | Nov 18, 2015

2© Cloudera, Inc. All rights reserved.

About me

Tech Lead on Cloudera Director

Previously founder of axemblr.com

Contributed to Apache Whirr (PMC) & jclouds.

Twitter: @andreisavu

3© Cloudera, Inc. All rights reserved.

Cloudera Directorcloudera.com/director

Deploy and manage enterprise-grade Hadoop in the cloud

AWS & Google CloudExtensible via plugins

Journey to the Cloud

5© Cloudera, Inc. All rights reserved.

Do you use a public or private cloud?

How do you run and manage Hadoop?

6© Cloudera, Inc. All rights reserved.

What is this talk about?

State of the WorldArchitectural PatternsImagine the Future

7© Cloudera, Inc. All rights reserved.

Gartner's 2015 Hype Cycle for Emerging Technologies (source)

Advanced AnalyticsHybrid CloudInternet of Things

8© Cloudera, Inc. All rights reserved.

Hybrid Clouds

Cloud ExchangeApplication PortabilityPrivate-PublicPublic-Public

9© Cloudera, Inc. All rights reserved.

Cloud Wars

AWSMicrosoft AzureGoogle CloudVMWareOpenstacketc.

10© Cloudera, Inc. All rights reserved.

Data has Mass and Gravity

11© Cloudera, Inc. All rights reserved.

Hadoop EnvironmentsOn-Premise versus Cloud

On-Premise CloudStorage Direct Attached Direct Attached or Object Store

Data Not shared across clusters Shared across multiple clusters

Sizing Fixed-size Dynamic based on load

Usage Model All users share cluster Clusters created as needed for apps/users

Resource Management (YARN)

HDFS

Process Discover Model Serve

Industry Standard Servers (CPU, Memory, & Direct Attached Storage)

Resource Management (YARN)

HDFS

Process Discover Model Serve

Industry Standard Servers (CPU & Memory)

Object Storage

12© Cloudera, Inc. All rights reserved.

Cloud providers shipping distributions of Hadoop

IntegrationUnlock Query EnginesMigration workloads

Is that a sustainable advantage? Or just a temporary stop gap?

13© Cloudera, Inc. All rights reserved.

Maturity level

On-prem vs. CloudMonitoringDev / Test / ProdAvailabilityDurability

14© Cloudera, Inc. All rights reserved.

Common Architectural Patterns in the Cloud

Object Storage

Source Data Seed Data Backup/DR

ETL/MODELING(Spark, MapReduce)

• Short-running clusters• Elastic workload• No local storage

necessary

|WASB |SWIFT |BLOB

• Long-running clusters• Sized to demand• Some local storage

BI/ANALYTICS(Impala, Solr)

• Fixed clusters • Periodic sync• Default to local

storage

APP DELIVERY(HBase, Kudu)

15© Cloudera, Inc. All rights reserved.

Cluster lifecycle management

Create / TerminateDiscoveryMetadataMonitoring

16© Cloudera, Inc. All rights reserved.

Work Queue

WorkflowsDispatchTrackingDecoupledFault Tolerant

17© Cloudera, Inc. All rights reserved.

Common Architectural Patterns in the Cloud

Object Storage

Source Data Seed Data Backup/DR

ETL/MODELING(Spark, MapReduce)

• Short-running clusters• Elastic workload• No local storage

necessary

|WASB |SWIFT |BLOB

• Long-running clusters• Sized to demand• Some local storage

BI/ANALYTICS(Impala, Solr)

• Fixed clusters • Periodic sync• Default to local

storage

APP DELIVERY(HBase, Kudu)

18© Cloudera, Inc. All rights reserved.

Multi-user

SecureIsolatedFriendly

19© Cloudera, Inc. All rights reserved.

Elastic

Grow or shrinkBusiness hoursNumber of usersStorage vs. ComputeCost efficient

20© Cloudera, Inc. All rights reserved.

Common Architectural Patterns in the Cloud

Object Storage

Source Data Seed Data Backup/DR

ETL/MODELING(Spark, MapReduce)

• Short-running clusters• Elastic workload• No local storage

necessary

|WASB |SWIFT |BLOB

• Long-running clusters• Sized to demand• Some local storage

BI/ANALYTICS(Impala, Solr)

• Fixed clusters • Periodic sync• Default to local

storage

APP DELIVERY(HBase, Kudu)

21© Cloudera, Inc. All rights reserved.

Advanced Monitoring

LatencyResource utilizationConsistent performance

22© Cloudera, Inc. All rights reserved.

High availability and failure domains

Data durabilityRepair within SLAHost-to-instance

23© Cloudera, Inc. All rights reserved.

Backup and disaster recovery

Object store centricActive-Standby

24© Cloudera, Inc. All rights reserved.

Imagine the Future

Portable ExperienceSelf-serviceSelf-healingGranular SecurityAdvanced GovernanceComplete Management

What’s your vision?

25© Cloudera, Inc. All rights reserved.

Thank [email protected]

26© Cloudera, Inc. All rights reserved.

Resources

Cloudera Director: http://www.cloudera.com/director

Interested in API level integration and scripting?

https://github.com/cloudera/director-sdk

https://github.com/cloudera/director-scripts

Interested in integration with another cloud platform?

https://github.com/cloudera/director-spi

https://github.com/cloudera/director-google-plugin

Cloudera Director Screenshots

© 2014 Cloudera, Inc. All rights reserved.

© 2014 Cloudera, Inc. All rights reserved.

© 2014 Cloudera, Inc. All rights reserved.

© 2014 Cloudera, Inc. All rights reserved.

© 2014 Cloudera, Inc. All rights reserved.

© 2014 Cloudera, Inc. All rights reserved.

© 2014 Cloudera, Inc. All rights reserved.

© 2014 Cloudera, Inc. All rights reserved.

© 2014 Cloudera, Inc. All rights reserved.

© 2014 Cloudera, Inc. All rights reserved.

© 2014 Cloudera, Inc. All rights reserved.

© 2014 Cloudera, Inc. All rights reserved.

© 2014 Cloudera, Inc. All rights reserved.

© 2014 Cloudera, Inc. All rights reserved.

© 2014 Cloudera, Inc. All rights reserved.

44© Cloudera, Inc. All rights reserved.

Thank [email protected]