demystify big data breakfast briefing: herb cunitz, hortonworks
DESCRIPTION
Demystify Big Data Breakfast Briefing, 9th July London - Herb Cunitz HortonworksTRANSCRIPT
© Hortonworks Inc. 2013. Confidential and Proprietary.
Hadoop in LondonJuly 9, 2013
Herb Cunitz
Hortonworks President
@hcunitz
Page 1
© Hortonworks Inc. 2013. Confidential and Proprietary.
Why is Hadoop Important?
We Believe that More than Half the World's Data Will Be Processed by Apache Hadoop.
By 2015, Organizations that Build a Modern Information Management System Will
Outperform their Peers Financially by 20 Percent.
– Gartner, Mark Beyer, “Information Management in the 21st Century”
© Hortonworks Inc. 2013. Confidential and Proprietary.
New Sources (sentiment, clickstream, geo, sensor, …)
Traditional Data ArchitectureAP
PLIC
ATIO
NS
DATA
SYS
TEM
S
TRADITIONAL REPOS
RDBMS EDW MPP
DATA
SO
URC
ES
OLTP, POS SYSTEMS
Business Analytics
Custom Applications
PackagedApplications
Pressured
TRADITIONAL REPOS
RDBMS EDW MPP
OPERATIONALTOOLS
MANAGE & MONITOR
DEV & DATATOOLS
BUILD & TEST
Traditional Sources (RDBMS, OLTP, OLAP)
© Hortonworks Inc. 2013. Confidential and Proprietary.
PressuredTraditional Data Architecture
Source: IDC
New Sources (sentiment, clickstream, geo, sensor, …)
2.8 ZB in 2012
85% from New Data Types
15x Machine Data by 2020
40 ZB by 2020
© Hortonworks Inc. 2013. Confidential and Proprietary.
New Sources (sentiment, clickstream, geo, sensor, …)
Modern Data Architecture EnabledAP
PLIC
ATIO
NS
DATA
SYS
TEM
SDA
TA S
OU
RCES
OLTP, POS SYSTEMS
Business Analytics
Custom Applications
PackagedApplications
TRADITIONAL REPOS
RDBMS EDW MPP
Traditional Sources (RDBMS, OLTP, OLAP)
MANAGE & MONITOR
OPERATIONALTOOLS
BUILD & TEST
DEV & DATATOOLS
ENTERPRISE HADOOP PLATFORM
© Hortonworks Inc. 2013. Confidential and Proprietary.
Agile “Data Lake” Solution Architecture
Capture All Data Process & Structure1 2 Distribute Results3 Feedback & Retain4
Dashboards, Reports, Visualization, …
Web, Mobile, CRM, ERP,Point of sale
Business Transactions& Interactions
Business Intelligence & Analytics
Classic Data Integration & ETL
Logs & Text Data
Sentiment Data
Structured DB Data
Clickstream Data
Geo & Tracking Data
Sensor & Machine Data
Enterprise Hadoop Platform
© Hortonworks Inc. 2013. Confidential and Proprietary.
BATCH INTERACTIVE STREAMING GRAPH IN-MEMORY HPC MPIONLINE OTHER…
Key Requirement of a “Data Lake”
Store ALL DATA in one place…
…and Interact with that data in MULTIPLE WAYS
HDFS (Redundant, Reliable Storage)
© Hortonworks Inc. 2013. Confidential and Proprietary.
Applications Run Natively IN Hadoop
BATCHMapReduce
INTERACTIVETez
STREAMINGStorm
GRAPHGiraph
IN-MEMORYSpark
HPC MPIOpenMPI
ONLINEHBase
OTHER…ex. Search
YARN Takes Hadoop Beyond Batch
Applications run “IN” Hadoop versus “ON” Hadoop…
…with Predictable Performance and Quality of Service
HDFS2 (Redundant, Reliable Storage)
YARN (Cluster Resource Management)
© Hortonworks Inc. 2013. Confidential and Proprietary.
2.0 Architected for theBroad Enterprise
Hadoop 2.0 Key Highlights
Rolling Upgrades
Disaster Recovery
Snapshots
Full Stack HA
Hive on Tez
YARN
HDP 2.0 Features
Single Cluster,Many Workloads
BATCH
INTERACTIVE
ONLINE
STREAMING
ZERO downtime
Multi Data Center
Point in time Recovery
Reliability
Interactive Query
Mixed workloads
Enterprise Requirements
© Hortonworks Inc. 2013. Confidential and Proprietary.
Making Hadoop Enterprise Ready
OS/VM Cloud Appliance
Enterprise Hadoop Platform
PLATFORM SERVICES
Enterprise ReadinessHigh Availability, Disaster Recovery,Security and Snapshots
OPERATIONAL SERVICES
Manage & Operate at Scale
DATASERVICES
Store, Process and Access Data
COREDistributed Storage & Processing
© Hortonworks Inc. 2013. Confidential and Proprietary.
SQL-IN-Hadoop with Apache Hive
Stinger Initiative Focus Areas
Make Hive 100X Faster
Make Hive SQL Compliant HDFS2
YARN
HIVE
SQL
MAPREDUCE
Business Analytics
CustomApps
TEZ
© Hortonworks Inc. 2013. Confidential and Proprietary.
Falcon: One-stop Shop for Data Lifecycle
Falcon: Data Lifecycle Management Framework
Data Import and Replication
Scheduling and
Coordination
Data Lifecycle Policies
Multi-Cluster Management
SLA Management
© Hortonworks Inc. 2013. Confidential and Proprietary.
Knox: Simplify Hadoop User Access
Hadoop Cluster
Authentication & Verification
Client
User StoreKDC, AD,
LDAP
{REST}Knox
gatewaycluster
Simplify Security
For both users andoperators
Aggregate Access
Deliver unified access for A ‘single application’ feel
Client Agility
Abstract users fromlocation of services
© Hortonworks Inc. 2013. Confidential and Proprietary.
Innovate
Participate
Integrate
Many Communities Must Work As One
Open Source
End Users
Vendors
© Hortonworks Inc. 2013. Confidential and Proprietary.
Ecosystem Completes the Puzzle
Data Systems
Applications, Business Tools, & Dev Tools
Infrastructure & Systems Management
© Hortonworks Inc. 2013. Confidential and Proprietary.
Hadoop Wave ONE: Web-scale Batch Apps
time
rela
tive
% c
ust
om
ers
Customers want solutions & convenience
Customers want technology & performance
Source: Geoffrey Moore - Crossing the Chasm
2006 to 2012Web-Scale
Batch Applications
Innovators, technology enthusiasts
Early adopters,
visionaries
Early majority,
pragmatists
Latemajority,
conservatives
Laggards, Skeptics
Th
e C
HA
SM
© Hortonworks Inc. 2013. Confidential and Proprietary.
Customers want solutions & convenience
Customers want technology & performance
Hadoop Wave TWO: Broad Enterprise Apps
time
rela
tive
% c
ust
om
ers
Source: Geoffrey Moore - Crossing the Chasm
Innovators, technology enthusiasts
Early adopters,
visionaries
Early majority,
pragmatists
Latemajority,
conservatives
Laggards, Skeptics
Th
e C
HA
SM
2013 & BeyondBatch, Interactive, Online, Streaming, etc., etc.
© Hortonworks Inc. 2013. Confidential and Proprietary.
Hortonworks – We Do Hadoop
Open Source Community
PartnerEcosystem
Commercial Adoption
© Hortonworks Inc. 2013
Thank You
Page 20