apache hadoop on the open cloud david dobbins nirmal ranganathan

24
Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

Upload: emil-blair

Post on 29-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

Apache Hadoopon the

Open Cloud

David Dobbins

Nirmal Ranganathan

Page 2: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

Who is using Apache Hadoop

•Traditionally = Developers

•Increasingly = Business Users / Data Scientists

•Why does this matter?

Page 3: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

3

Configuring and managing a Hadoop cluster is hard

Page 4: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

4

Resources / Expertise

Page 5: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

5

Multiple Performance and Design Variables

Page 6: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

6

The Cloud solves some of these

Page 7: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

7

Advantages of using the cloud

FastEasy

Flexible

Page 8: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

8

You still require expertise

Page 9: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

9

Lets check out another option

Page 10: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

10

Hadoop in the Cloud Use Cases

Page 11: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

11

Development / POC Clusters

Page 12: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

12

Dynamic Clusters

Page 13: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

13

Growth Clusters

Page 14: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

14

Your data is already in the Cloud

Page 15: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

15

Demo

Run an actual job

Page 16: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

16

Swift Filesystem for Hadoop: HADOOP-8545

•New filesystem URL, swift://•Read from, write to local & remote Swift clusters

•Keep long-lived data in Swift; upload while Hadoop cluster off-line

The challenges of running Map Reduce jobs against Swift..

• Identity management

• Block size

• Object store vs file paths

• Direct API into swift from HDFS

Page 17: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

17

Map Reduce to Swift (via “HDFS”)

HDFS

MapReduce

Application X

HDFS Proxy

MapReduce

Application X

SWIFT

Page 18: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

18

Hadoop + Openstack

Page 19: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

19

Cloud Big Data Platform

•Hortonworks Data Platform• HDP 1.1

• HDP 1.3

• Pig, Hive, HCatalog

• Coming soon HDP 2.0

Page 20: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

20

Cloud Big Data Platform

•Secure by default

•Comes pre-optimized

•Web UI, CLI, REST API

Page 21: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

21

Built on Openstack

Page 22: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

22

Why an Open Platform mattersSandbox on

Rackspace Cloud

Sandbox

VM

RAX

Resell

Page 23: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

Cool stuff

Page 24: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

@caffiend@rnirmal

http://www.rackspace.com/big-data