hortonworks - what's possible with a modern data architecture?
DESCRIPTION
This is Mark Ledbetter's presentation from the September 22, 2014 Hortonworks webinar “What’s Possible with a Modern Data Architecture?” Mark is vice president for industry solutions at Hortonworks. He has more than twenty-five years experience in the software industry with a focus on Retail and supply chain.TRANSCRIPT
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hortonworks We Do Hadoop. We Do Retail.
September 22, 2014
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Our Mission: Power your Modern Data Architecture with HDP and Enterprise Apache Hadoop
Who we are June 2011: Original 24 architects, developers, operators of Hadoop from Yahoo! June 2014: An enterprise software company with 500+ Employees
Key Partners
Our model Innovate and deliver Apache Hadoop as a complete enterprise data platform completely in the open, backed by a world class support organization
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Fastest growing Fortune 1000 customer base
Customer Momentum • 300+ customers in seven quarters, growing at 75+/quarter • Two thirds of customers come from F1000
Largest Cluster in North America
32,000 Nodes Largest Cluster in Europe
1,000 Nodes
Some notable migrations include many of the early adopters of Hadoop:
© Hortonworks Inc. 2011 – 2014. All Rights Reserved
Experience at Scale 80,000 nodes under contract
Largest Known Cluster in APAC
400 Nodes
30+ customers migrated from other distributions
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Enabling a Modern Data Architecture with HDP and Apache Hadoop
Spring 2014 Version 1.4
We Do Hadoop. We Do Retail.
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
APP
LIC
ATIO
NS
DAT
A S
YSTE
M
Business Analytics
Custom Applications
Packaged Applications
Traditional systems under pressure
• Silos of Data • Costly to Scale • Constrained Schemas
Clickstream
Geolocation
Sentiment, Web Data
Sensor. Machine Data
Unstructured docs, emails
Server logs
SOU
RC
ES
Existing Sources (CRM, ERP,…)
RDBMS EDW MPP
New Data Types
…and difficult to manage new data
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
LIMITATIONS Silos & Expensive
Single Purpose
APP
LIC
ATIO
NS
DAT
A S
YSTE
M
Business Analytics
Custom Applications
Packaged Applications
Why a Modern Data Architecture?
RDBMS EDW MPP
MDA: Key Drivers
1. Leverage new types of data 2. IT optimization 3. Enable a data lake GOALS • Extend new data sets across
existing data platforms • Common data platform, multiple
processing engines • Batch, interactive and real time on
a single data platform
EXISTING Systems
Clickstream Web &Social
Geoloca9on Sensor & Machine
Server Logs
Unstructured
SOU
RC
ES
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDP2 and YARN enable the Modern Data Architecture
Hortonworks architected and led development of YARN
Common data set, multiple applications • Optionally land all data in a single cluster
• Batch, interactive & real-time use cases
• Support multi-tenant access, processing & segmentation of data
YARN: Architectural center of Hadoop • Consistent security, governance & operations • Ecosystem applications certified
by Hortonworks to run natively in Hadoop
SOU
RC
ES
EXISTING Systems
Clickstream Web &Social
Geoloca9on Sensor & Machine
Server Logs
Unstructured
APP
LIC
ATIO
NS
DAT
A S
YSTE
M
Business Analytics
Custom Applications
Packaged Applications
RDBMS EDW MPP YARN: Data Operating System
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° N
HDFS (Hadoop Distributed File System)
Interactive Real-Time Batch
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
HDP delivers a comprehensive data management platform
HDP 2.1 Hortonworks Data Platform
Provision, Manage & Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow, Lifecycle & Governance
Falcon Sqoop Flume NFS
WebHDFS
YARN: Data Operating System
DATA MANAGEMENT
SECURITY BATCH, INTERACTIVE & REAL-TIME DATA ACCESS
GOVERNANCE & INTEGRATION
Authentication Authorization Accounting
Data Protection
Storage: HDFS Resources: YARN Access: Hive, … Pipeline: Falcon
Cluster: Knox
OPERATIONS
Script
Pig
Search
Solr
SQL
Hive HCatalog
NoSQL
HBase Accumulo
Stream
Storm
Others
ISV Engines
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
N
HDFS (Hadoop Distributed File System)
In-Memory
Spark
Deployment Choice
Linux Windows On-Premise Cloud
YARN is the architectural center of HDP
• Enables batch, interactive and real-time workloads
• Single SQL engine for both batch and interactive
• Enable existing ISV apps to plug directly into Hadoop via YARN
Provides comprehensive enterprise capabilities
• Governance
• Security
• Operations
The widest range of deployment options
• Linux & Windows
• On premise & cloud
Tez Tez
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Our Approach
Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hortonworks Approach
Innovate the Core 1
Architect and build innovation at the core of Hadoop
• YARN: Data Operating System • HDFS as the storage layer • Key processing engines
YARN : Data Opera9ng System
Script Pig
Search
Solr
SQL
Hive/Tez, HCatalog
NoSQL
HBase Accumulo
Stream
Storm
Batch
Map Reduce
HDFS (Hadoop Distributed File System)
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hortonworks Approach
Innovate the Core 1
Architect and build innovation at the core of Hadoop
• YARN: Data Operating System • HDFS as the storage layer • Key processing engines
Extend Hadoop as an Enterprise Data Platform 2
Extend Hadoop with enterprise capabilities for governance, security & operations Apply enterprise software rigor to the open source development process
YARN : Data Opera9ng System
Script Pig
Search
Solr
SQL
Hive/Tez, HCatalog
NoSQL
HBase Accumulo
Stream
Storm
Batch
Map Reduce
HDFS (Hadoop Distributed File System)
HDP 2.1
Gov
erna
nce
&
Inte
grat
ion
Secu
rity
Ope
ratio
ns
Data Access
Data Management
YARN
Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hortonworks Approach
Innovate the Core 1
Architect and build innovation at the core of Hadoop
• YARN: Data Operating System • HDFS as the storage layer • Key processing engines
Extend Hadoop as an Enterprise Data Platform 2 Enable the Ecosystem 3
Extend Hadoop with enterprise capabilities for governance, security & operations Apply enterprise software rigor to the open source development process
Enable the leaders in the data center to easily adopt & extend their platforms
• Establish Hadoop as standard component of a modern data architecture
• Joint engineering
YARN : Data Opera9ng System
Script Pig
Search
Solr
SQL
Hive/Tez, HCatalog
NoSQL
HBase Accumulo
Stream
Storm
Batch
Map Reduce
HDFS (Hadoop Distributed File System)
HDP 2.1
Gov
erna
nce
&
Inte
grat
ion
Secu
rity
Ope
ratio
ns
Data Access
Data Management
YARN
Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
YARN : Data Opera9ng System
Script Pig
Search
Solr
SQL
Hive/Tez, HCatalog
NoSQL
HBase Accumulo
Stream
Storm
Batch
Map Reduce
HDFS (Hadoop Distributed File System)
Contributes more to the Apache Hadoop ecosystem in the ASF than any other vendor
Hadoop is a platform decision
• Open Source: fastest path to innovation for a platform technology
• Eliminate vendor lock in, no proprietary software
• Data center leaders have committed to the open source approach
…all done completely in Open Source 4
Apache Project Committers PMC
Members
Hadoop 26 20 Tez 15 13
Hive 15 5
HBase 7 3
Pig 5 5
Accumulo 2 2
Flume 1 0
Storm 2 2
Sqoop 1 0
Ambari 32 28
Oozie 3 2
Zookeeper 2 1
Knox 6 6
Falcon 3 3
TOTAL 120 90
HDP 2.1
Gov
erna
nce
&
Inte
grat
ion
Secu
rity
Ope
ratio
ns
Data Access
Data Management
YARN
Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The Modern Data Architecture w/ HDP
Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop Juices Sales in Retail FUNCTION USE CASE
Marketing
360° View of Customer:
Ø Customer Lifetime Value Ø Targeted Marketing Campaigns
Segmentation Pricing Brand Sentiment Analysis
eCommerce & Customer Service Product Recommendation Engine Web Path Optimization Call Center Productivity
Forecasting, Allocation & Merchandizing Product Placement Store-Level Optimization of Assortment, Prices and Spaces
Procurement & Supply Chain
Inventory Management Real-time Delivery Management Improved Order Picking Vendor Management Strategic Sourcing
Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Case Study: 12 month Hadoop evolution at TrueCar D
ata
Plat
form
Cap
abili
ties
12 months execution plan
June 2013 Begin Hadoop Execution
July 2013 Hortonworks Partnership
May ‘14 IPO
Aug 2013 Training & Dev. Begins
Nov 2013 Production Cluster 60 Nodes 2 PB
Jan 2014 40% Dev. Staff Proficient
Dec 2013 Three Production Apps (3 total)
Feb 2014 Three More Production Apps (6 total)
12 Month Results at TrueCAR • Six Production Hadoop Applications • Sixty nodes/2PB data • Storage Costs/Compute Costs
from $19/GB to $0.23/GB
“We addressed our data platform capabilities strategically as a pre-cursor to IPO.”
Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hortonworks Support We Do Hadoop. We Do Retail.
Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
End to end support to ensure your Hadoop success
Hortonworks Support Backed by the architects, builders and operators of Hadoop, Hortonworks offers the most effective and complete Hadoop support available Support Provided • Application Development Support • Diagnose Install, Config & Cluster Mgmt Issues • Access to Upgrades, Updates and Patches • Diagnose Performance Issues • Remote Troubleshooting • Diagnose Loading, Processing & Query Issues • Customer Support Portal • Advanced Knowledgebase
Architect & Design Development Implementation Production
Only Hortonworks provides unlimited support across architecture, development,
implementation & production
Mission Critical Hadoop Support
Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
End to end support to ensure your Hadoop success
Architect & Design Development Implementation Production
Only Hortonworks provides unlimited support across architecture, development,
implementation & production
Mission Critical Hadoop Support
Services
Hortonworks Services Our services team ensures your Hadoop project will be delivered successfully Services Provided • Architecture • Implementation • Cluster Tuning • Migration • Best Practices
Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
End to end support to ensure your Hadoop success
Architect & Design Development Implementation Production
Only Hortonworks provides unlimited support across architecture, development,
implementation & production
Mission Critical Hadoop Support
Services
Training
Hortonworks University We offer a wide range of training options backed by experts and designed to evolve your teams Hadoop proficiency Custom Coursework • On-site training for your team • Customized for your requirements
Public Courses • Offered in all geographies • Hadoop Architect • Hadoop Developer • Hadoop Analyst • Hadoop Operations • Data Science
Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop is a Platform Decision
Open Leadership Drive innovation in the open via the Apache community-driven open source process
Enterprise Rigor Engineer, test and certify Apache Hadoop with the enterprise in mind
Ecosystem Endorsement Focus on deep integration with existing data center technologies and skills
Fastest Growing Customer and Partner Base Largest and most experienced Hadoop adopters have standardized on Hortonworks The data center leaders have standardized on Hortonworks
Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Questions? We Do Hadoop. We Do Retail.
September 22, 2014