riding the elephant - hadoop 2.0
DESCRIPTION
Hadoop 2.0, and in particular YARN has opened up a lot of potential applications beyond MapReduce. This presentation explains some of the ways this happened, and what you can now do that you couldn't before. It also introduces some new tools (Spark) and infrastructure pieces (Mesos) to achieve even more efficient cluster use.TRANSCRIPT
Simon Elliston Ball Head of Big Data - Red Gate Ventures
@sireb
Riding the Elephant: Hadoop 2.0
http://bit.ly/RidingElephants
Append only distributed file-system
In the beginning…
Map Reduce
Java.
JVM Based (scala, groovy, jython, clojure)
More languages
Streaming (python, whatever)HDP for Windows and .NET SDK
Abstraction
Photo: https://www.flickr.com/photos/puroticorico/
Hive, Pig
Cascading
Scalding
SQL on Hadoop
Learning to share the toys
HBase
Solr on Hadoop
Sharing HDFS…
Map Reduce v1
JobTracker
Job
Head Node
TaskTrackerTask (Map /
Reduce)
Data Node
m slot 1
m slot 2
…m slot
n
Task
Task
Task
r slot 1
r slot 2
…r slot
nTaskTrack
erTask (Map / Reduce)
Data Node
m slot 1
m slot 2
…m slot
n
r slot 1
r slot 2
…r slot
n
TaskTrackerTask (Map /
Reduce)
Data Node
m slot 1
m slot 2
…m slot
n
r slot 1
r slot 2
…r slot
n
Map Reduce v1
JobTracker
Job
Head Node
TaskTrackerTask (Map /
Reduce)
Data Node
m slot 1
m slot 2
…m slot
n
MR Status
MR Status
MR
Statu
s
r slot 1
r slot 2
…r slot
nTaskTrack
erTask (Map / Reduce)
Data Node
m slot 1
m slot 2
…m slot
n
r slot 1
r slot 2
…r slot
n
TaskTrackerTask (Map /
Reduce)
Data Node
m slot 1
m slot 2
…m slot
n
r slot 1
r slot 2
…r slot
n
Typical Hadoop 1.x setup
HBase
Production
Adhoc
Typical Hadoop 1.x setup
HBase
Production
Adhoc
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container
Container
Container
Data Node
Node Manager
Application
Master
Container
Free Slot
Data Node
Node Manager
ResourceManager
YARN Client
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container
Container
Container
Data Node
Node Manager
Application
Master
Container
Free Slot
Data Node
Node Manager
ResourceManager
YARN Client
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container
Container
Container
Data Node
Node Manager
Application
Master
Container
Free Slot
Data Node
Node Manager
ResourceManager
YARN Client
YARN architecture
Container
Application
Master
Container
Data Node
Node Manager
Container
Container
Container
Data Node
Node Manager
Application
Master
Container
Free Slot
Data Node
Node Manager
ResourceManager
YARN Client
Removing the choke point
Advantages
60%-150% better usageLong running applications
Not quite…
Operating system for Big Data?
Security
…but a framework for Big Data Apps
Data Access abstraction
Storm on YARN
A whole batch of new applications
HOYA
Tez (Stinger)
MapReduce 2
Giraph
<Insert your application here>
Batch applications
Spinning YARNs with Spring
Services
Direct to YARN APIs
Spring Data Hadoop abstraction
Streaming
Why?
Machine Learning
Graphs
Services
Distributed Shell - Anything.
Spark
A higher abstraction
Hadoop based?
… but can run on YARN
In MemoryDistributedFault tolerantReal-time
✓✓✓
✓❌
RRDs
✓
Mesos
Wider sharing
Hadoop
Spark
Aurora
Mesos Framework
Hardware
YARN
MapReduce
HBase
etc
HDFS
Hadoop is more than MapReduce
The new world
YARN opens up new paradigms
Infrastructure maturing: better sharing
Hadoop and beyond!
Thank you
Questions?Simon Elliston Ball Head of Big Data - Red Gate Ventures
http://bit.ly/RidingElephants