Download - Apache Hadoop on the Open Cloud
Apache Hadoopon the
Open Cloud
David DobbinsNirmal Ranganathan
Who is using Apache Hadoop
•Traditionally = Developers
•Increasingly = Business Users / Data Scientists
•Why does this matter?
3
Configuring and managing a Hadoop cluster is hard
4
Resources / Expertise
5
Multiple Performance and Design Variables
6
The Cloud solves some of these
7
Advantages of using the cloud
FastEasy
Flexible
8
You still require expertise
9
Lets check out another option
10
Hadoop in the Cloud Use Cases
11
Development / POC Clusters
12
Dynamic Clusters
13
Growth Clusters
14
Your data is already in the Cloud
15
Demo
Run an actual job
16
Swift Filesystem for Hadoop: HADOOP-8545•New filesystem URL, swift://•Read from, write to local & remote Swift clusters
•Keep long-lived data in Swift; upload while Hadoop cluster off-line
The challenges of running Map Reduce jobs against Swift..
• Identity management• Block size• Object store vs file
paths• Direct API into swift
from HDFS
17
Map Reduce to Swift (via “HDFS”)
HDFS
MapReduce
Application X
HDFS Proxy
MapReduce
Application X
SWIFT
18
Hadoop + Openstack
19
Cloud Big Data Platform
•Hortonworks Data Platform• HDP 1.1• HDP 1.3• Pig, Hive, HCatalog• Coming soon HDP 2.0
20
Cloud Big Data Platform
•Secure by default
•Comes pre-optimized
•Web UI, CLI, REST API
21
Built on Openstack
22
Why an Open Platform mattersSandbox on
Rackspace Cloud
SandboxVM
RAXResell
Cool stuff
@caffiend@rnirmal
http://www.rackspace.com/big-data