steve watt, chief architect, hadoop and big data, red hat - 21st bdl meetup
DESCRIPTION
"Hadoop Analytics on your data in place" Steve Watt leads engineering for the Hadoop and Big Data program at Red Hat. Most recently Steve has been focusing on Hadoop Interoperability and better enabling Hadoop support for alternative filesystems. Prior to Red Hat, Steve spent 2 years at Hewlett-Packard, first co-founding the Hadoop business and then leading engineering as the Hadoop CTO. Prior to HP, Steve was at IBM for 10 years where he created IBMs first Hadoop Distribution and was part of the team that built BigSheets, the first spreadsheet interface for Hadoop.TRANSCRIPT
@wattsteve
Analytics on your data in placeSteve Watt, Red Hat
CC flickr Barta IV
@wattsteve
Hadoop at Red Hat
@wattsteve
But tonight I have my community hat on
CC flickr wcdumonts
@wattsteve
Platform Layers Technologies
Computational Runtimes
MapReduce, HBase
FileSystems HDFS or Amazon S3
Infrastructures x86 or Amazon EC2
Hadoop in 2007
CC flickr wwarby
@wattsteve
Hadoop in 2013
CC flickr lowfatbrains
Platform Layers Technologies
Computational Runtimes
YARN, GiRAPH, MapReduce, HBase, Phoenix, Spark/BDAS, Drill, Impala, Stinger
FileSystems HDFS + 13 Other Hadoop FileSystems
Infrastructures System on a Chip, x86, Virtualization and Cloud
@wattsteveCC flickr grufnik
Observation #1: The Hadoop FileSystem Interface is the keystone of the entire Ecosystem
@wattsteve
.
CC flickr traftery
Observation #2: Moving data around just to analyze it is slow and expensive. Especially if it requires a redundant repository
@wattsteve
Hadoop FileSystem Interface
HBaseMapReduce YARNHadoop FS Clients
Hadoop FileSystem
So how does this work?
By leveraging Hadoop’s pluggable FileSystem architecture
Any Application
FileSystem Implementation
Hadoop FileSystem Plugin
@wattsteve
Hadoop FileSystem Interface
HDFS
HDFS Plugin
HBaseMapReduce YARNHadoop FS Clients
Hadoop FileSystem
Hadoop FileSystem Configuration for HDFS
Any Application
@wattsteve
What are some examples of where big data is stored? - Object Stores
- NoSQL Stores
- Distributed FileSystems
- Network Filers
- Databases
CC flickr birdwatcher63
@wattsteve
Hadoop FileSystem Interface
GlusterFS Plugin
HBaseMapReduce YARNHadoop FS Clients
Hadoop FileSystem
Network Filer Example
Hadoop FileSystem Configuration for GlusterFS
Any Application
@wattsteve
GlusterFS
Network Filer - Apache Hadoop on GlusterFS
Resource
ManagerManagement
Server
Trusted Peer
DAS Brick
Node
Manager
Node
Manager
DAS Brick
Trusted PeerTrusted Peer
DAS Brick
Node
Manager
Server 1 Server 2 Server 50
. . .
FUSE
Hadoop
Workers
Hadoop
Master Services
NFS
SWIFT
FUSE FUSE FUSE
plugin plugin plugin
plugin
@wattsteve
Hadoop FileSystem Interface
SWIFT Plugin
HBaseMapReduce YARNHadoop FS Clients
Hadoop FileSystem
Object Store Example
Hadoop FileSystem Configuration for SWIFT
Any Application
SWIFT
@wattsteve
Hadoop FileSystem Interface
CassandraFS Plugin
HBaseMapReduce YARNHadoop FS Clients
Hadoop FileSystem
NoSQL Example
Hadoop FileSystem Configuration for CassandraFS
Any Application
@wattsteve
NoSQL - Apache Hadoop on CassandraFS
@wattsteveCC flickr syume
We are working on filesystem tests within Apache Hadoop-Common and Apache BigTop as well as opening up ecosystem tools
@wattsteve
@wattsteve
@wattsteve
Closing Remarks
1. The amount of Hadoop FileSystems available to you continues to increase
2. This is good! A vibrant ecosystem gives you choice
3. Evaluate the option of analyzing your data in place before deploying new environments
CC flickr zoomboy1