modern data architecture with enterprise apache hadoop · pdf filemodern data architecture...
TRANSCRIPT
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Modern Data Architecture with Enterprise Apache Hadoop
Hortonworks. We do Hadoop.
Jeff Markham
Technical Director, APAC
Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Open
LeadershipDrive innovation in the open exclusively via the Apache community-driven open source process
Enterprise
RigorEngineer, test and certify Apache Hadoop with the enterprise in mind
Ecosystem
EndorsementFocus on deep integration with existing data center technologies and skills
Enable your Modern Data Architecture by delivering Enterprise Apache Hadoop
Our Mission:
Reseller Partners:
Headquartered in Palo Alto, CA; 300+ employees and growing
Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
A data architecture under pressure from new data
AP
PLI
CA
TIO
NS
DA
TA
SY
ST
EM
REPOSITORIES
SO
UR
CE
S
Existing Sources (CRM, ERP, Clickstream, Logs)
RDBMS EDW MPP
Business
Analytics
Custom
Applications
Packaged
Applications
Source: IDC
2.8 ZB in 2012
85% from New Data Types
15x Machine Data by 2020
40 ZB by 2020
OLTP, ERP, CRM Systems
Unstructured documents, emails
Clickstream
Server logs
Sentiment, Web Data
Sensor. Machine Data
Geolocation
Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop within an emerging Modern Data Architecture
OPERATIONS TOOLS
Provision, Manage &Monitor
DEV & DATA TOOLS
Build & Test
DA
TA
SY
ST
EM
REPOSITORIES
SO
UR
CE
S
RDBMS EDW MPP
OLTP, ERP,
CRM Systems
Documents,
Emails
Web Logs,
Click Streams
Social
Networks
Machine
Generated
Sensor
Data
Geolocation
Data
Go
ve
rna
nc
e
& In
teg
rati
on
Se
cu
rity
Op
era
tio
nsData Access
Data Management
AP
PLI
CA
TIO
NS
Business
Analytics
Custom
Applications
Packaged
Applications
Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop: typically used for new analytic applications7
SC
AL
E
SCOPE
New Analytic AppsNew types of dataLOB-driven
Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
7 and incrementally delivers a ‘Data Lake’
SC
AL
E
SCOPE
A Modern Data Architecture/Data Lake
New Analytic AppsNew types of dataLOB-driven
RDBMS
MPP
EDW
Go
ve
rna
nc
e
& In
teg
rati
on
Se
cu
rity
Op
era
tio
nsData Access
Data Management
Data Lake
An architectural shift in the data center that uses Hadoop to deliver deeper insight across a large, broad, diverse set of data at efficient scale
Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
ClickstreamCapture and analyze website visitors’ data trails and optimize your website
SensorsDiscover patterns in data streaming automatically from remote sensors and machines
Server LogsResearch logs to diagnose process failures and prevent security breaches
New types of dataHadoop Value:
SentimentUnderstand how your customers feel about your brand and products –right now
GeographicAnalyze location-based data to manage operations where they occur
UnstructuredUnderstand patterns in files across millions of web pages, emails, and documents
Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop unlocks a new approach: Iterative Analytics
HadoopMultiple Query Engines
Iterative Process: Explore, Transform, Analyze
SQLSingle Query Engine
Repeatable Linear Process
✚
Determine
list of
questions
Design
solutions
Collect
structured
data
Ask
questions
from list
Detect
additional
questions
Batch Interactive Real-time Streaming
Current Reality
Apply schema on write
Dependent on IT
Augment w/ Hadoop
Apply schema on read
Support range of access patterns to data stored in HDFS: polymorphic access
Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
New analytic applications for new types of data
$
• Supplier Consolidation
• Supply Chain and Logistics
• Assembly Line Quality Assurance
• Proactive Maintenance
• Crowdsourced Quality Assurance
• New Account Risk Screens
• Fraud Prevention
• Trading Risk
• Maximize Deposit Spread
• Insurance Underwriting
• Accelerate Loan Processing
• Call Detail Records (CDRs)
• Infrastructure Investment
• Next Product to Buy (NPTB)
• Real-time Bandwidth Allocation
• New Product Development
• 360° View of the Customer
• Analyze Brand Sentiment
• Localized, Personalized Promotions
• Website Optimization
• Optimal Store Layout
Financial
Services
Retail Telecom Manufacturing
HealthcareUtilities,
Oil & GasPublic
Sector
• Genomic data for medical trials
• Monitor patient vitals
• Reduce re-admittance rates
• Store medical research data
• Recruit cohorts for pharmaceutical trials
• Smart meter stream analysis
• Slow oil well decline curves
• Optimize lease bidding
• Compliance reporting
• Proactive equipment repair
• Seismic image processing
• Analyze public sentiment
• Protect critical networks
• Prevent fraud and waste
• Crowdsource reporting for repairs to infrastructure
• Fulfill open records requests
Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop delivers compelling economics
✚
EDW Optimization
OPERATIONS
50%
ANALYTICS
20%
ETL PROCESS
30%
OPERATIONS
50% ANALYTICS
50%
Current Reality
EDW at capacity: some usage from low value workloads
Older data archived, unavailable for ongoing exploration
Source data often discarded
Augment w/ Hadoop
Free up EDW resources from low value tasks
Keep 100% of source data and historical data for ongoing exploration
Mine data for value after loading it because of schema-on-read
MPP
SAN
Engineered System
NAS
HADOOP
Cloud Storage
$0 $20,000 $40,000 $60,000 $80,000 $180,000
Fully-loaded Cost Per Raw TB
of Data (Min–Max Cost)
Commodity Compute & Storage
Hadoop Enables Scalable Compute & Storage at a Compelling Cost Structure
Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Delivering the Core Capabilities of Enterprise Hadoop
Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Core Capabilities of Enterprise Hadoop
Load data and manage
according to policy
Deploy and effectively
manage the platform
Store and process all of your Corporate Data Assets
Access your data simultaneously in multiple ways(batch, interactive, real-time) Provide layered
approach tosecurity through Authentication, Authorization,
Accounting, and Data Protection
DATA MANAGEMENT
SECURITYDATA ACCESSGOVERNANCE &
INTEGRATIONOPERATIONS
Enable both existing and new application toprovide value to the organization
PRESENTATION & APPLICATION
Empower existing operations and security tools to manage Hadoop
ENTERPRISE MGMT & SECURITY
Provide deployment choice across physical, virtual, cloud
DEPLOYMENT OPTIONS
Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Core Technologies of Enterprise Hadoop
Provision, Manage &
Monitor
Ambari
Zookeeper
Scheduling
Oozie
Data Workflow, Lifecycle
& Governance
Falcon
Sqoop
Flume
NFS
WebHDFS
YARN : Data Operating System
DATA MANAGEMENT
SECURITYDATA ACCESSGOVERNANCE &
INTEGRATION
Authentication
Authorization
Accounting
Data Protection
Storage: HDFS
Resources: YARN
Access: Hive, …
Pipeline: Falcon
Cluster: Knox
OPERATIONS
Script
Pig
Search
Solr
SQL
Hive/Tez,
HCatalog
NoSQL
HBase
Accumulo
Stream
Storm
Others
In-Memory
Analytics,
ISV engines
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
N
HDFS (Hadoop Distributed File System)
Batch
Map Reduce
Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hortonworks provides leadership to Hadoop
614,041
449,768
147,933
Total Net Lines Contributed to Apache Hadoop
End Users
25
10 Yahoo7 Cloudera
5 Facebook
3 IBM
3 LinkedIn
10 Others
Total Number of Committersto Apache Hadoop
Apache
ProjectCommitters
PMC
Members
Hadoop 21 13
Tez 10 4
Hive 15 3
HBase 8 3
Pig 6 5
Sqoop 1 0
Ambari 21 12
Knox 6 2
Falcon 4 2
Oozie 2 2
Zookeeper 2 1
Flume 1 0
Accumulo 2 2
Storm 1 0
TOTAL 103 49
Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
The Partners You Rely On, Rely On Hortonworks for Hadoop
Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hadoop is wholly integrated into the data center
AP
PLI
CA
TIO
NS
DA
TA
SY
ST
EM
SO
UR
CE
S
RDBMS EDW MPP
Emerging Sources (Sensor, Sentiment, Geo, Unstructured)
HANA
OPERATIONAL TOOLS
DEV & DATA TOOLS
Existing Sources (CRM, ERP, Clickstream, Logs)
INFRASTRUCTURE
HDP 2.1
Go
ve
rna
nc
e
& In
teg
rati
on
Se
cu
rity
Op
era
tio
nsData Access
Data Management
DEPTHHortonworks engages in deep engineered relationships with the leaders in the data center, such as Microsoft, Teradata, Redhat & SAP
BREADTHHundreds of partners work with us to certify their applications to work with Hadoop so they can extend big data to their users
Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Financial Services High Level Use Cases
Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Telco High Level Use Cases
Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hortonworks Data Platform
So
lr
Had
oo
p
&Y
AR
N
Pig
Tez
Hiv
e &
HC
ata
log
HB
ase
Sq
oo
p
Oo
zie
Zo
okeep
er
Mah
ou
t
Am
bari
Sto
rm
Flu
me
Kn
ox
Ph
oen
ix
Accu
mu
lo
HDP 2.1: Reliable, Consistent & Current
HDP certifies most recent & stable community innovation
2.2.0
1.1.2
0.11.0
0.11.0
0.12.0
0.12.0
HDP 1.3
May
2013
2.4.0 0.12.1
HDP 2.0
October
2013
HDP 2.1
April
2014
SecurityOperationsData AccessData
Management
0.13.0
0.94.6
0.96.1
0.98.0
0.9.1
0.7.0
0.8.0
0.9.04.7.2
1.4.3
1.4.4
1.3.1
1.4.0
1.2.5
1.4.4
1.5.1
3.3.2
4.0.0
3.4.5
0.4.0
0.4.04.0.0
1.5.1
Falc
on
0.5.0
Governance
& Integration
Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Stable Project Releases
Fixed Issues
Hortonworks process for Enterprise Hadoop
Upstream Community Projects Downstream Enterprise Product
Certified at scale using the most advanced
Hadoop test bed on the planet
• 1000’s of production nodes at Yahoo!• Over 1500 unit & system tests
HDP 2.1
Distribute
Integrate & Test
Package & Certify
ReleaseApache
Hadoop
Test &Patch
Design & Develop
Virtuous cycle when development & fixed issues done
upstream & stable project releases flow downstream
Design & Develop
Apache
HiveApache
HBase
Apache
Pig
Apache
Falcon
Apache
KnoxApache
Storm
Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Working in the community for the enterprise
Hortonworks de-risks your investmentE
All support and implementation backed by the largest and most experienced Hadoop team on the planet
A Two Way Street
� Gain access to the expertise and knowledge of the Hadoop community
� All issues or feature requests represented in the community
Apache Community Leadership
Support
Organization
Implementation
Team
Hortonworks Reseller, System Integrator,
Technology, and Training Partners
Engineering Team
Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Hortonworks: A Leader In Hadoop
The Forrester Wave™: Big Data Hadoop Solutions, Q1 2014
“Hortonworks loves and lives
open source innovation”
Vision & Execution for Enterprise Hadoop.Hortonworks leads with a strong strategy and roadmap for open source innovation with Hadoop and a strong delivery of that innovation in Hortonworks Data Platform.
World Class Support and Services.Hortonworks' Customer Support received a maximum score and was significantly higher than both Cloudera and MapR.
Key Strategic Partnerships. Hortonworks unique strategic partnerships with Microsoft, SAP, Teradata and others are a key strength as part of its overall strategy of ecosystem partnership to accelerate Hadoop adoption in the enterprise.
Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Avoid Vendor
Lock In
Hortonworks Data Platform is close to the open source trunk as possible and is developed 100% in the open so you
are never locked in
Partnerships
that Matter
We work with partners to deeply integrate Hadoop with data center technologies so you can leverage existing skills and investments
Support from
the Experts
We provide the highest quality of support for deploying at scale. You
are supported by hundreds of years of Hadoop experience
The value of Open7
Connect With
the Community
We employ a large number of Apache project committers & innovators so that you
are represented in the open source community
Certified for
the Enterprise
We engineer, test and certify the Hortonworks Data Platform at scale to ensure reliability and stability you require for enterprise use
Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved
Thank You
Jeff Markham
Technical Director, APAC