steve watt, chief architect, hadoop and big data, red hat - 21st bdl meetup

19
@wattstev e Analytics on your data in place Steve Watt, Red Hat CC flickr Barta IV

Upload: bigdatalondon

Post on 17-May-2015

362 views

Category:

Technology


1 download

DESCRIPTION

"Hadoop Analytics on your data in place" Steve Watt leads engineering for the Hadoop and Big Data program at Red Hat. Most recently Steve has been focusing on Hadoop Interoperability and better enabling Hadoop support for alternative filesystems. Prior to Red Hat, Steve spent 2 years at Hewlett-Packard, first co-founding the Hadoop business and then leading engineering as the Hadoop CTO. Prior to HP, Steve was at IBM for 10 years where he created IBMs first Hadoop Distribution and was part of the team that built BigSheets, the first spreadsheet interface for Hadoop.

TRANSCRIPT

Page 1: Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup

@wattsteve

Analytics on your data in placeSteve Watt, Red Hat

CC flickr Barta IV

Page 2: Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup

@wattsteve

Hadoop at Red Hat

Page 3: Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup

@wattsteve

But tonight I have my community hat on

CC flickr wcdumonts

Page 4: Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup

@wattsteve

Platform Layers Technologies

Computational Runtimes

MapReduce, HBase

FileSystems HDFS or Amazon S3

Infrastructures x86 or Amazon EC2

Hadoop in 2007

CC flickr wwarby

Page 5: Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup

@wattsteve

Hadoop in 2013

CC flickr lowfatbrains

Platform Layers Technologies

Computational Runtimes

YARN, GiRAPH, MapReduce, HBase, Phoenix, Spark/BDAS, Drill, Impala, Stinger

FileSystems HDFS + 13 Other Hadoop FileSystems

Infrastructures System on a Chip, x86, Virtualization and Cloud

Page 6: Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup

@wattsteveCC flickr grufnik

Observation #1: The Hadoop FileSystem Interface is the keystone of the entire Ecosystem

Page 7: Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup

@wattsteve

.

CC flickr traftery

Observation #2: Moving data around just to analyze it is slow and expensive. Especially if it requires a redundant repository

Page 8: Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup

@wattsteve

Hadoop FileSystem Interface

HBaseMapReduce YARNHadoop FS Clients

Hadoop FileSystem

So how does this work?

By leveraging Hadoop’s pluggable FileSystem architecture

Any Application

FileSystem Implementation

Hadoop FileSystem Plugin

Page 9: Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup

@wattsteve

Hadoop FileSystem Interface

HDFS

HDFS Plugin

HBaseMapReduce YARNHadoop FS Clients

Hadoop FileSystem

Hadoop FileSystem Configuration for HDFS

Any Application

Page 10: Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup

@wattsteve

What are some examples of where big data is stored? - Object Stores

- NoSQL Stores

- Distributed FileSystems

- Network Filers

- Databases

CC flickr birdwatcher63

Page 11: Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup

@wattsteve

Hadoop FileSystem Interface

GlusterFS Plugin

HBaseMapReduce YARNHadoop FS Clients

Hadoop FileSystem

Network Filer Example

Hadoop FileSystem Configuration for GlusterFS

Any Application

Page 12: Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup

@wattsteve

GlusterFS

Network Filer - Apache Hadoop on GlusterFS

Resource

ManagerManagement

Server

Trusted Peer

DAS Brick

Node

Manager

Node

Manager

DAS Brick

Trusted PeerTrusted Peer

DAS Brick

Node

Manager

Server 1 Server 2 Server 50

. . .

FUSE

Hadoop

Workers

Hadoop

Master Services

NFS

SWIFT

FUSE FUSE FUSE

plugin plugin plugin

plugin

Page 13: Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup

@wattsteve

Hadoop FileSystem Interface

SWIFT Plugin

HBaseMapReduce YARNHadoop FS Clients

Hadoop FileSystem

Object Store Example

Hadoop FileSystem Configuration for SWIFT

Any Application

SWIFT

Page 14: Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup

@wattsteve

Hadoop FileSystem Interface

CassandraFS Plugin

HBaseMapReduce YARNHadoop FS Clients

Hadoop FileSystem

NoSQL Example

Hadoop FileSystem Configuration for CassandraFS

Any Application

Page 15: Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup

@wattsteve

NoSQL - Apache Hadoop on CassandraFS

Page 16: Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup

@wattsteveCC flickr syume

We are working on filesystem tests within Apache Hadoop-Common and Apache BigTop as well as opening up ecosystem tools

Page 17: Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup

@wattsteve

Page 18: Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup

@wattsteve

Page 19: Steve Watt, Chief Architect, Hadoop and Big Data, Red Hat - 21st BDL meetup

@wattsteve

Closing Remarks

1. The amount of Hadoop FileSystems available to you continues to increase

2. This is good! A vibrant ecosystem gives you choice

3. Evaluate the option of analyzing your data in place before deploying new environments

CC flickr zoomboy1