apache hadoop on the open cloud

Post on 24-Feb-2016

49 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Apache Hadoop on the Open Cloud. David Dobbins Nirmal Ranganathan. Who is using Apache Hadoop. Traditionally = Developers Increasingly = Business Users / Data Scientists Why does this matter?. Configuring and managing a Hadoop cluster is hard. Resources / Expertise. - PowerPoint PPT Presentation

TRANSCRIPT

Apache Hadoopon the

Open Cloud

David DobbinsNirmal Ranganathan

Who is using Apache Hadoop

•Traditionally = Developers

•Increasingly = Business Users / Data Scientists

•Why does this matter?

3

Configuring and managing a Hadoop cluster is hard

4

Resources / Expertise

5

Multiple Performance and Design Variables

6

The Cloud solves some of these

7

Advantages of using the cloud

FastEasy

Flexible

8

You still require expertise

9

Lets check out another option

10

Hadoop in the Cloud Use Cases

11

Development / POC Clusters

12

Dynamic Clusters

13

Growth Clusters

14

Your data is already in the Cloud

15

Demo

Run an actual job

16

Swift Filesystem for Hadoop: HADOOP-8545•New filesystem URL, swift://•Read from, write to local & remote Swift clusters

•Keep long-lived data in Swift; upload while Hadoop cluster off-line

The challenges of running Map Reduce jobs against Swift..

• Identity management• Block size• Object store vs file

paths• Direct API into swift

from HDFS

17

Map Reduce to Swift (via “HDFS”)

HDFS

MapReduce

Application X

HDFS Proxy

MapReduce

Application X

SWIFT

18

Hadoop + Openstack

19

Cloud Big Data Platform

•Hortonworks Data Platform• HDP 1.1• HDP 1.3• Pig, Hive, HCatalog• Coming soon HDP 2.0

20

Cloud Big Data Platform

•Secure by default

•Comes pre-optimized

•Web UI, CLI, REST API

21

Built on Openstack

22

Why an Open Platform mattersSandbox on

Rackspace Cloud

SandboxVM

RAXResell

Cool stuff

@caffiend@rnirmal

http://www.rackspace.com/big-data

top related