hadoop at the center: the next generation of hadoop

42
HADOOP AT THE CENTER: THE NEXT GENERATION OF HADOOP DATA MARKETING 2014 TORONTO Adam Muise Principal Architect Hortonworks

Upload: adam-muise

Post on 14-Jul-2015

191 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Hadoop at the Center: The Next Generation of Hadoop

HADOOP  AT  THE  CENTER:  THE  NEXT  GENERATION  OF  HADOOP  DATA  MARKETING  2014  -­‐  TORONTO  

Adam  Muise  Principal  Architect  Hortonworks  

Page 2: Hadoop at the Center: The Next Generation of Hadoop

Who  am  I?  

Page 3: Hadoop at the Center: The Next Generation of Hadoop

Who  is                                                      ?  

Page 4: Hadoop at the Center: The Next Generation of Hadoop

We  do  Hadoop  

The  leaders  of  Hadoop’s  development  

Community  driven,    Enterprise  Focused  

Drive  InnovaDon  in  the  plaEorm  –  We  lead  the  roadmap    

100%  Open  Source  –  DemocraDzed  Access  to  Data  

Page 5: Hadoop at the Center: The Next Generation of Hadoop

We  do  Hadoop  successfully.  >  Develop  Open  Source  Hadoop  >  Distribute  Hadoop  with  HDP  >  Support  >  Professional  Services  >  Training  

Page 6: Hadoop at the Center: The Next Generation of Hadoop

Hortonworks Approach Innovate the Core 1

Architect and build innovation at the core of Hadoop

•  YARN: Data Operating System

•  HDFS as the storage layer •  Key processing engines

Extend Hadoop as an Enterprise Data Platform 2 Enable the Ecosystem 3

Extend Hadoop with enterprise capabilities for governance, security & operations Apply enterprise software rigor to the open source development process

Enable the leaders in the data center to easily adopt & extend their platforms

•  Establish Hadoop as standard component of a modern data architecture

•  Joint engineering

YARN  :  Data  Opera>ng  System  

Script    Pig      

SQL    

Hive/Tez,  HCatalog  

   

NoSQL    

HBase  Accumulo  

   

Stream      

Storm  

     

Batch    

Map  Reduce  

   

HDFS    (Hadoop  Distributed  File  System)  

                                                                                                   HDP  2.2  

Gov

erna

nce

&

Inte

grat

ion  

Secu

rity  

Ope

ratio

ns  

Data Access  

Data Management  

YARN  

Memory    

Spark      

Page 7: Hadoop at the Center: The Next Generation of Hadoop

YARN  :  Data  Opera>ng  System  

Script    Pig      

Memory    

Spark      

SQL    

Hive/Tez,  HCatalog  

   

NoSQL    

HBase  Accumulo  

   

Stream      

Storm  

     

Batch    

Map  Reduce  

   

HDFS    (Hadoop  Distributed  File  System)  

Innova>ng  within  the  community  for  the  enterprise  •  Open  Source:  fastest  path  to  innovaDon  for  a  plaEorm  technology  

•  Complete  open  source  plaEorm  speeds  enterprise  and  ecosystem    adopDon  and  minimizes  lock  in  

•  Enables  the  market  to  funcDon  much  bigger  much  faster  

…all done completely in Open Source 4

                                                                                                 HDP  2.2  

Gov

erna

nce

&

Inte

grat

ion  

Secu

rity  

Ope

ratio

ns  

Data Access  

Data Management  

YARN  

Driving  our  innova>on  through  Apache  SoQware  Founda>on  Projects  

Apache  Project   CommiTers   PMC  

Members  

Hadoop   27   20  

Pig   5   5  

Hive   16   4  

Tez   15   15  

HBase   6   4  

Phoenix   4   4  

Accumulo   2   2  

Storm   3   2  

Slider   10   10  

Flume   1   0  

Sqoop   1   1  

Ambari   32   27  

Oozie   3   2  

Zookeeper   2   1  

Knox   11   5  

Argus   10   n/a  

Falcon   5   3  

TOTAL   153   105  

Page 8: Hadoop at the Center: The Next Generation of Hadoop

Let’s  talk  challenges…  

Page 9: Hadoop at the Center: The Next Generation of Hadoop

Volume  

Volume  Volume  

Volume  

Page 10: Hadoop at the Center: The Next Generation of Hadoop

Volume  

Volume  Volume  

Volume  

Volume  Volume  Volume  Volume  

Volume   Volume  Volume  Volume  

Volume  

Volume  Volume  Volume  

Volume  

Page 11: Hadoop at the Center: The Next Generation of Hadoop

Volume  

Volume  Volume  

Volume  

Volume  Volume  Volume  Volume  

Volume   Volume  Volume  Volume  

Volume  

Volume  Volume  Volume  

Volume   Volume  Volume  Volume  

Volume  

Volume  Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  Volume  Volume  

Volume  

Volume  Volume  Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  Volume  Volume  

Volume  Volume  

Volume  

Volume  

Page 12: Hadoop at the Center: The Next Generation of Hadoop

Volume  

Volume  Volume  

Volume  

Volume  Volume  Volume  Volume  

Volume   Volume  Volume  Volume  

Volume  

Volume  Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  Volume  Volume  

Volume  Volume  

Volume  Volume  

Volume  

Volume  Volume  Volume  

Volume  

Volume  Volume  Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  

Volume  Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  Volume  Volume  

Volume  Volume  

Volume  

Volume  Volume  Volume  Volume  

Volume  Volume  

Volume  

Volume  Volume  Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  Volume  Volume  

Volume  Volume  

Volume  

Volume  Volume  Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  Volume  Volume  

Volume  Volume  

Volume  

Volume  

Volume  Volume  Volume  

Volume  Volume  

Volume  

Volume  

Page 13: Hadoop at the Center: The Next Generation of Hadoop

Storage,  Management,  Processing  all  become  challenges  with  Data  at  

Volume  

Page 14: Hadoop at the Center: The Next Generation of Hadoop

TradiDonal  technologies  adopt  a  divide,  drop,  and  conquer  approach  

Page 15: Hadoop at the Center: The Next Generation of Hadoop

The  soluDon?  EDW  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Yet  Another  EDW  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

AnalyDcal  DB  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data   OLTP  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Another  EDW  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Page 16: Hadoop at the Center: The Next Generation of Hadoop

Ummm…you  dropped  something  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  Data  Data  Data  

Data  Data  Data  

Data   Data  Data  Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

EDW  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Yet  Another  EDW  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

AnalyDcal  DB  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

OLTP  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Another  EDW  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Page 17: Hadoop at the Center: The Next Generation of Hadoop

What  keeps  us  from  our  Data?  

Page 18: Hadoop at the Center: The Next Generation of Hadoop

Data  Silos.  Your  data  silos  are  lonely  places.  

EDW  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Accounts  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Customers  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Web  ProperDes  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Page 19: Hadoop at the Center: The Next Generation of Hadoop

…  Data  likes  to  be  together.  

EDW  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Accounts  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Customers  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Web  ProperDes  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Page 20: Hadoop at the Center: The Next Generation of Hadoop

Data  likes  to  socialize  too.  EDW  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Accounts  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Customers  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Web  ProperDes  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Machine  Data  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Twiber  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Facebook  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

CDR  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Weather  Data  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  

Page 21: Hadoop at the Center: The Next Generation of Hadoop

New  types  of  data  don’t  quite  fit  into  your  prisDne  view  of  the  world.  

My  Lible  Data  Empire  

Data  Data  Data  

Data  Data  Data  

Data   Data  Data  

Logs  

Data  Data  Data  Data  

Data  Data  Data  

Machine  Data  

Data  Data  Data  Data  

Data  Data  Data  

?  ?  ?  ?  

Page 22: Hadoop at the Center: The Next Generation of Hadoop

To  resolve  this,  some  people  take  hints  from  Lord  Of  The  Rings...  

Page 23: Hadoop at the Center: The Next Generation of Hadoop

…and  create  One-­‐Schema-­‐To-­‐Rule-­‐Them-­‐All…  

EDW  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  Schema  

Page 24: Hadoop at the Center: The Next Generation of Hadoop

…but  that  has  its  problems  too.  

EDW  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  Schema  Data  

Data  Data  

ETL   ETL  

ETL   ETL  

EDW  

Data  Data  Data  Data  

Data  Data  

Data   Data  Data  Schema  Data  

Data  Data  

ETL   ETL  

ETL   ETL  

Page 25: Hadoop at the Center: The Next Generation of Hadoop

What  if  the  data  was  processed  and  stored  centrally?  What  if  you  didn’t  need  to  force  it  into  a  single  schema?  

   

We  call  it  a  Modern  Data  Architecture*        

*AKA  Data  Lake  

Page 26: Hadoop at the Center: The Next Generation of Hadoop

A Modern Data Architecture •  Consolidate siloed data sets structured

and unstructured

•  Central data set on a single cluster

•  Multiple workloads across batch interactive and real time

•  Central services for security, governance and operation

•  Preserve existing investment in current tools and platforms

•  Single view of the customer, product, supply chain

APPLICAT

IONS  

DATA

   SYSTEM  

Business    Analy>cs  

Custom  Applica>ons  

Packaged  Applica>ons  

RDBMS  

EDW  

MPP  

YARN:  Data  Opera>ng  System  

1   °   °   °   °   °   °   °   °   °  

°   °   °   °   °   °   °   °   °   N  

Interactive Real-Time Batch CRM  

ERP  

Other  1   °   °   °  

°   °   °   °  

HDFS    (Hadoop  Distributed  File  System)  

SOURC

ES  

EXISTING  Systems  

Clickstream   Web    &Social  

Geoloca>on   Sensor    &  Machine  

Server    Logs  

Unstructured  

Page 27: Hadoop at the Center: The Next Generation of Hadoop

What  do  you  want  to  do  with  your  data?  

Page 28: Hadoop at the Center: The Next Generation of Hadoop

MarkeDng  AnalyDcs  needs  data.  Work  with  the  populaDon,  not  just  a  

sample.  

Page 29: Hadoop at the Center: The Next Generation of Hadoop

Your  segmentaDon  today.  

Male  

Female  

Age:  25-­‐30  

Town/City  

Middle  Income  Band  

Product  Category  Preferences  

Page 30: Hadoop at the Center: The Next Generation of Hadoop

Your  segmentaDon  with  beber  data.  

Male  

Female  

Age:  27  but  feels  old  

GPS  coordinates  

$65-­‐68k  per  year  

Product  recommendaDons  per  Dme  of  day  and  per  weather  

Tea  Party  Hippie  

Looking  to  start  a  business    

Walking  into  Starbucks  right  now…  

A  depressed  Toronto  Maple  Leaf’s  Fan  

Products  lem  in  basket  indicate  drunk  amazon  shopper  

Purchase  history  indicates  a  risk  taker  

Thinking  about  a  new  house  

Unhappy  with  his  cell  phone  plan  

Pregnant  

Spent  25  minutes  looking  at  tea  cozies  

Page 31: Hadoop at the Center: The Next Generation of Hadoop

Pick  up  all  of  that  data  that  was  prohibiDvely  expensive  to  store  and  

use.      

Page 32: Hadoop at the Center: The Next Generation of Hadoop

To  approach  these  use  cases  you  need  an  affordable  plaEorm  that  stores,  processes,  and  analyzes  the  

data.    

Page 33: Hadoop at the Center: The Next Generation of Hadoop

Don’t  wait  for  your  data.    Batch  is  omen  too  late  to  influence  the  person  who  

is  in  your  store  or  on  your  website  right  now.  

Page 34: Hadoop at the Center: The Next Generation of Hadoop

Streaming Processing, Search, and Storage

Hortonworks  Data  Plaaorm  2.2  

YARN  

HDFS  

APACHE  KAFKA  

Search  Solr  Slider  

 

Online  Data    Processing  

HBase      

Real  Time  Stream    Processing  

Storm   SQL  Hive  

Streaming  Ingest  

Stream  data  into  Hadoop  and  process  it  in  near  real-­‐;me  

Real-­‐Dme  data  feeds  

Page 35: Hadoop at the Center: The Next Generation of Hadoop

How?  With  Hortonworks  Data  PlaEorm*  

       

*AKA  Hadoop  

Page 36: Hadoop at the Center: The Next Generation of Hadoop

What’s New in HDP 2.2

New and Improved YARN Ready Engines

•  Enterprise SQL at Hadoop Scale with Stinger.next

•  Enterprise Ready Spark on YARN •  Deep YARN integration for real-time

engines: HBase, Accumulo, Storm •  Enabling ISVs with a general SDK and API

for direct YARN integration •  Only solution to provide real-time to micro

batch for analyzing the internet of things •  Other engines/tools: Solr, Cascading

Continued Innovation of Central Enterprise Services •  Centralized security administration

and policy enforcement •  Ease of use and operations agility

features to speed cluster deployment

•  100% uptime target with cluster rolling upgrades

Expanded Deployment Options •  Enhanced business continuity with

replication/archival across on-premises and cloud storage tiers (Azure Blob, S3)

•  Simultaneous ship of Windows and Linux installs

•  Expand Azure support beyond HDInsight Azure to include HDP for Windows or Linux in Azure VMs

HDP  2.2  Delivering  Apache  Hadoop  for  the  Enterprise    

Page 37: Hadoop at the Center: The Next Generation of Hadoop

Complete List of New Features in HDP 2.2 Apache Hadoop YARN •  Slide existing services onto YARN through ‘Slider’ •  GA release of HBase, Accumulo, and Storm on

YARN •  Support long running services: handling of logs,

containers not killed when AM dies, secure token renewal, YARN Labels for tagging nodes for specific workloads

•  Support for CPU Scheduling and CPU Resource Isolation through CGroups

Apache Hadoop HDFS •  Heterogeneous storage: Support for archival •  Rolling Upgrade (This is an item that applies to the

entire HDP Stack. YARN, Hive, HBase, everything. We now support comprehensive Rolling Upgrade across the HDP Stack).

•  Multi-NIC Support •  Heterogeneous storage: Support memory as a

storage tier (TP) •  HDFS Transparent Data Encryption (TP) Apache Hive, Apache Pig, and Apache Tez •  Hive Cost Based Optimizer: Function Pushdown &

Join re-ordering support for other join types: star & bushy.

•  Hive SQL Enhancements including: •  ACID Support: Insert, Update, Delete •  Temporary Tables •  Metadata-only queries return instantly •  Pig on Tez •  Including DataFu for use with Pig •  Vectorized shuffle •  Tez Debug Tooling & UI

Hue •  Support for HiveServer 2 •  Support for Resource Manager HA

Apache Spark •  Refreshed Tech Preview to Spark 1.1.0 (available

now) •  ORC File support & Hive 0.13 integration •  Planned for GA of Spark 1.2.0 •  Operations integration via YARN ATS and Ambari •  Security: Authentication •  Apache Solr •  Added Banana, a rich and flexible UI for visualizing

time series data indexed in Solr •  Cascading •  Cascading 3.0 on Tez distributed with HDP

— coming soon Apache Falcon •  Authentication Integration •  Lineage – now GA. (it’s been a tech preview

feature…) •  Improve UI for pipeline management & editing: list,

detail, and create new (from existing elements) •  Replicate to Cloud – Azure & S3 Apache Sqoop, Apache Flume & Apache Oozie •  Sqoop import support for Hive types via HCatalog •  Secure Windows cluster support: Sqoop, Flume,

Oozie •  Flume streaming support: sink to HCat on secure

cluster •  Oozie HA now supports secure clusters •  Oozie Rolling Upgrade •  Operational improvements for Oozie to better

support Falcon •  Capture workflow job logs in HDFS •  Don’t start new workflows for re-run •  Allow job property updates on running jobs

Apache HBase, Apache Phoenix, & Apache Accumulo •  HBase & Accumulo on YARN via Slider •  HBase HA •  Replicas update in real-time •  Fully supports region split/merge •  Scan API now supports standby RegionServers •  HBase Block cache compression •  HBase optimizations for low latency •  Phoenix Robust Secondary Indexes •  Performance enhancements for bulk import into

Phoenix •  Hive over HBase Snapshots •  Hive Connector to Accumulo •  HBase & Accumulo wire-level encryption •  Accumulo multi-datacenter replication Apache Storm •  Storm-on-YARN via Slider •  Ingest & notification for JMS (IBM MQ not

supported) •  Kafka bolt for Storm – supports sophisticated

chaining of topologies through Kafka •  Kerberos support •  Hive update support – Streaming Ingest •  Connector improvements for HBase and HDFS •  Deliver Kafka as a companion component •  Kafka install, start/stop via Ambari •  Security Authorization Integration with Ranger Apache Slider •  Allow on-demand create and run different versions

of heterogeneous applications •  Allow users to configure different application

instances differently •  Manage operational lifecycle of application

instances •  Expand / shrink application instances •  Provide application registry for publish and

discovery

Apache Knox & Apache Ranger (Argus) & HDP Security •  Apache Ranger – Support authorization and auditing

for Storm and Knox •  Introducing REST APIs for managing policies in

Apache Ranger •  Apache Ranger – Support native grant/revoke

permissions in Hive and HBase •  Apache Ranger – Support Oracle DB and storing of

audit logs in HDFS •  Apache Ranger to run on Windows environment •  Apache Knox to protect YARN RM •  Apache Knox support for HDFS HA •  Apache Ambari install, start/stop of Knox Apache Ambari •  Support for HDP 2.2 Stack, including support for

Kafka, Knox and Slider •  Enhancements to Ambari Web configuration

management including: versioning, history and revert, setting final properties and downloading client configurations

•  Launch and monitor HDFS rebalance •  Perform Capacity Scheduler queue refresh •  Configure High Availability for ResourceManager •  Ambari Administration framework for managing user

and group access to Ambari •  Ambari Views development framework for

customizing the Ambari Web user experience •  Ambari Stacks for extending Ambari to bring custom

Services under Ambari management •  Ambari Blueprints for automating cluster

deployments •  Performance improvements and enterprise usability

guardrails

Page 38: Hadoop at the Center: The Next Generation of Hadoop

Hortonworks Data Platform: A comprehensive data management platform

Hortonworks  Data  Plaaorm  2.2  

   

YARN: Data Operating System (Cluster  Resource  Management)  

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Script

Pig

SQL

Hive

Tez Tez

Java Scala

Cascading

Tez

° °

° °

° ° ° ° °

° ° ° ° °

Others

ISV Engines

HDFS (Hadoop Distributed File System)

Stream

Storm

Search

Solr

NoSQL

HBase Accumulo

Slider Slider

   

SECURITY  GOVERNANCE   OPERATIONS  BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

Provision, Manage & Monitor

Ambari

Zookeeper

Scheduling

Oozie

Data Workflow, Lifecycle & Governance

Falcon Sqoop Flume Kafka NFS

WebHDFS

Authentication Authorization Accounting

Data Protection

Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon

Cluster: Knox Cluster: Ranger

Deployment Choice Linux Windows On-Premises Cloud

YARN is the architectural center of HDP

Enables batch, interactive and real-time workloads

Provides comprehensive enterprise capabilities

The widest range of deployment options

Delivered  Completely  in  the  OPEN  

Page 39: Hadoop at the Center: The Next Generation of Hadoop

HDP = Apache Hadoop

Hortonworks  Data  Plaaorm  2.2  

   Had

oop  

   &YA

RN  

   Pig  

   Hive  &  HCatalog  

   HBa

se  

   Sqo

op  

   Oozie  

   Zoo

keep

er  

   Amba

ri  

   Storm

 

   Flume  

   Kno

x  

   Pho

enix  

   Accum

ulo  

2.2.0  0.12.0  

0.12.0  2.4.0  

0.12.1  

Data Management

0.13.0  

0.96.1  

0.98.0  

0.9.1   1.4.4  

1.3.1  

1.4.0  

1.4.4  

1.5.1  

3.3.2  

4.0.0  

3.4.5  0.4.0  

4.0.0  

1.5.1  

   Falcon  

0.5.0  

   Ran

ger  

   Spa

rk  

   Kaf

a  

0.14.0  0.14.0  

0.98.4  

1.6.1  

4.2   0.9.3  

1.2.0  0.6.0  

0.8.1  

1.4.5  

1.5.0  

1.7.0  

4.1.0  0.5.0  

0.4.0  2.6.0  

3.4.5  

   Tez  

0.4.0  

   Slid

er  

0.60  

HDP  2.0  

October  

2013  

HDP  2.2  

October  

2014  

HDP  2.1  

April  

2014  

   Solr  

4.7.2  

4.10.0  

0.5.1  

Data Access Governance & Integration Security Operations

Page 40: Hadoop at the Center: The Next Generation of Hadoop

What  else  are  we  working  on?  

hortonworks.com/labs/  

Page 41: Hadoop at the Center: The Next Generation of Hadoop

Hadoop  is  the  new  Data  OperaDng  System  for  the  Enterprise  

Page 42: Hadoop at the Center: The Next Generation of Hadoop

© Hortonworks Inc. 2012: DO NOT SHARE. CONTAINS HORTONWORKS CONFIDENTIAL & PROPRIETARY INFORMATION Page  42  

There is NO second place

Hortonworks  …the  Bull  Elephant  of  Hadoop  Innova>on