riding the elephant - hadoop 2.0

24
Simon Elliston Ball Head of Big Data - Red Gate Ventures @sireb Riding the Elephant: Hadoop 2.0 http://bit.ly/RidingElephants

Upload: simon-elliston-ball

Post on 06-May-2015

216 views

Category:

Technology


4 download

DESCRIPTION

Hadoop 2.0, and in particular YARN has opened up a lot of potential applications beyond MapReduce. This presentation explains some of the ways this happened, and what you can now do that you couldn't before. It also introduces some new tools (Spark) and infrastructure pieces (Mesos) to achieve even more efficient cluster use.

TRANSCRIPT

Page 1: Riding the Elephant - Hadoop 2.0

Simon Elliston Ball Head of Big Data - Red Gate Ventures

@sireb

Riding the Elephant: Hadoop 2.0

http://bit.ly/RidingElephants

Page 2: Riding the Elephant - Hadoop 2.0

Append only distributed file-system

In the beginning…

Map Reduce

Java.

Page 3: Riding the Elephant - Hadoop 2.0

JVM Based (scala, groovy, jython, clojure)

More languages

Streaming (python, whatever)HDP for Windows and .NET SDK

Page 4: Riding the Elephant - Hadoop 2.0

Abstraction

Photo: https://www.flickr.com/photos/puroticorico/

Hive, Pig

Cascading

Scalding

Page 5: Riding the Elephant - Hadoop 2.0

SQL on Hadoop

Learning to share the toys

HBase

Solr on Hadoop

Sharing HDFS…

Page 6: Riding the Elephant - Hadoop 2.0

Map Reduce v1

JobTracker

Job

Head Node

TaskTrackerTask (Map /

Reduce)

Data Node

m slot 1

m slot 2

…m slot

n

Task

Task

Task

r slot 1

r slot 2

…r slot

nTaskTrack

erTask (Map / Reduce)

Data Node

m slot 1

m slot 2

…m slot

n

r slot 1

r slot 2

…r slot

n

TaskTrackerTask (Map /

Reduce)

Data Node

m slot 1

m slot 2

…m slot

n

r slot 1

r slot 2

…r slot

n

Page 7: Riding the Elephant - Hadoop 2.0

Map Reduce v1

JobTracker

Job

Head Node

TaskTrackerTask (Map /

Reduce)

Data Node

m slot 1

m slot 2

…m slot

n

MR Status

MR Status

MR

Statu

s

r slot 1

r slot 2

…r slot

nTaskTrack

erTask (Map / Reduce)

Data Node

m slot 1

m slot 2

…m slot

n

r slot 1

r slot 2

…r slot

n

TaskTrackerTask (Map /

Reduce)

Data Node

m slot 1

m slot 2

…m slot

n

r slot 1

r slot 2

…r slot

n

Page 8: Riding the Elephant - Hadoop 2.0

Typical Hadoop 1.x setup

HBase

Production

Adhoc

Page 9: Riding the Elephant - Hadoop 2.0

Typical Hadoop 1.x setup

HBase

Production

Adhoc

Page 10: Riding the Elephant - Hadoop 2.0

YARN architecture

Container

Application

Master

Container

Data Node

Node Manager

Container

Container

Container

Data Node

Node Manager

Application

Master

Container

Free Slot

Data Node

Node Manager

ResourceManager

YARN Client

Page 11: Riding the Elephant - Hadoop 2.0

YARN architecture

Container

Application

Master

Container

Data Node

Node Manager

Container

Container

Container

Data Node

Node Manager

Application

Master

Container

Free Slot

Data Node

Node Manager

ResourceManager

YARN Client

Page 12: Riding the Elephant - Hadoop 2.0

YARN architecture

Container

Application

Master

Container

Data Node

Node Manager

Container

Container

Container

Data Node

Node Manager

Application

Master

Container

Free Slot

Data Node

Node Manager

ResourceManager

YARN Client

Page 13: Riding the Elephant - Hadoop 2.0

YARN architecture

Container

Application

Master

Container

Data Node

Node Manager

Container

Container

Container

Data Node

Node Manager

Application

Master

Container

Free Slot

Data Node

Node Manager

ResourceManager

YARN Client

Page 14: Riding the Elephant - Hadoop 2.0

Removing the choke point

Advantages

60%-150% better usageLong running applications

Page 15: Riding the Elephant - Hadoop 2.0

Not quite…

Operating system for Big Data?

Security

…but a framework for Big Data Apps

Data Access abstraction

Page 16: Riding the Elephant - Hadoop 2.0

Storm on YARN

A whole batch of new applications

HOYA

Tez (Stinger)

MapReduce 2

Giraph

<Insert your application here>

Page 17: Riding the Elephant - Hadoop 2.0

Batch applications

Spinning YARNs with Spring

Services

Direct to YARN APIs

Spring Data Hadoop abstraction

Page 18: Riding the Elephant - Hadoop 2.0

Streaming

Why?

Machine Learning

Graphs

Services

Distributed Shell - Anything.

Page 19: Riding the Elephant - Hadoop 2.0

Spark

A higher abstraction

Hadoop based?

… but can run on YARN

In MemoryDistributedFault tolerantReal-time

✓✓✓

✓❌

RRDs

Page 20: Riding the Elephant - Hadoop 2.0

Mesos

Wider sharing

Hadoop

Spark

Aurora

Mesos Framework

Hardware

YARN

MapReduce

HBase

etc

HDFS

Page 21: Riding the Elephant - Hadoop 2.0

Hadoop is more than MapReduce

The new world

YARN opens up new paradigms

Infrastructure maturing: better sharing

Page 22: Riding the Elephant - Hadoop 2.0

Hadoop and beyond!

Page 23: Riding the Elephant - Hadoop 2.0

Thank you

Page 24: Riding the Elephant - Hadoop 2.0

Questions?Simon Elliston Ball Head of Big Data - Red Gate Ventures

@[email protected]

http://bit.ly/RidingElephants