positioning, campaigns & 2.0 launch nicolas maillard.pdf · page 1 © hortonworks inc. 2011 –...

27
Page 1 © Hortonworks Inc. 2011 2014. All Rights Reserved Welcome Nicolas Maillard Hortonworks

Upload: others

Post on 20-May-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Welcome

Nicolas Maillard – Hortonworks

Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HDP Enabling the Modern Data Architecture

Nicolas Maillard – Hortonworks

Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hortonworks enables adoption of Apache Hadoop

through HDP (Hortonworks Data Platform)

• Founded in 2011

• Original 24 architects, developers,

operators of Hadoop from Yahoo!

• We are leaders in Hadoop

community

• 500+ employees

Customer Momentum • 300+ customers in seven quarters, growing at 75+/quarter

• Two thirds of customers come from F1000

Hortonworks and Hadoop at Scale • HDP in production on largest clusters on planet

• Multiple +1000 node clusters, including 35,000 nodes at

Yahoo!, 800 nodes at Spotify

Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Our Mission To enable Apache Hadoop to be the enterprise data

platform that powers the modern data architecture

and processes half the worlds data

Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Our Strategy: A Commitment to Enterprise Hadoop

Innovate the Core 1

Architect and build

innovation at the core of

Hadoop

• YARN transformed Hadoop

to enable multiple workloads

across a multi-tenant

architecture

Enable the Ecosystem 3

Enable the leaders in the data

center to easily adopt & extend

their platforms

• Establish Hadoop as standard

component of a modern data

architecture

• Joint engineering

Extend Hadoop as an

Enterprise Data Platform 2

Extend Hadoop with enterprise

capabilities for governance,

security & operations

Apply enterprise software rigor

to the open source development

process

HDP 2.1

Go

ve

rna

nc

e

& In

teg

rati

on

Se

cu

rity

Op

era

tio

ns

Data Access

Data Management

YARN

YARN: Data Operating System

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° N

HDFS (Hadoop Distributed File System)

Interactive Real-Time Batch

Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Apache

Project Committers

PMC

Members

Hadoop 27 20

Pig 5 5

Hive 16 4

Tez 15 15

HBase 6 4

Phoenix 4 4

Accumulo 2 2

Storm 3 2

Slider 10 10

Flume 1 0

Sqoop 1 0

Ambari 32 27

Oozie 3 2

Zookeeper 2 1

Knox 11 5

Argus 10 n/a

Falcon 5 3

TOTAL 153 104

YARN : Data Operating System

Script

Pig

Memory

Spark

SQL

Hive/Tez, HCatalog

NoSQL

HBase Accumulo

Stream

Storm

Batch

Map Reduce

HDFS (Hadoop Distributed File System)

Contributes more to the Apache Hadoop

ecosystem in the ASF than any other vendor

Innovating within the community for the enterprise

• Open Source: fastest path to innovation for a platform technology

• Complete open source platform speeds enterprise and ecosystem

adoption and minimizes lock in

• Enables the market to function much bigger, much faster

…all done completely in Open Source 4

HDP 2.1

Go

ve

rna

nc

e

& In

teg

rati

on

Se

cu

rity

Op

era

tio

ns

Data Access

Data Management

YARN

Driving our innovation through

Apache Software Foundation Projects

Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop Driver: Enabling the data lake S

CA

LE

SCOPE

Data Lake Definition

• Centralized Architecture Multiple applications on a shared data set

with consistent levels of service

• Any App, Any Data Multiple applications accessing all data

affording new insights and opportunities.

• Unlocks ‘Systems of Insight’ Advanced algorithms and applications

used to derive new value and optimize

existing value.

Drivers:

1. Cost Optimization

2. Advanced Analytic Apps

Goal:

• Centralized Architecture

• Data-driven Business

DATA LAKE

Journey to the Data Lake with Hadoop

Systems of Insight

Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

2013 Digital universe

4.4 Zettabytes

1 Zettabyte (ZB) = 1 million Petabytes (PB); Sources: IDC, IDG Enterprise, and AMR Research

85% of growth from

new types of data with

machine-generated data

increasing 15x

2020 Digital universe

44 Zettabytes

& Hadoop Market $50B

Data is doubling in

size every two years

Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Traditional systems under pressure

Challenges

• Constrains data to app

• Can’t manage new data

• Costly to Scale

Business Value

Clickstream

Geolocation

Web Data

Internet of Things

Docs, emails

Server logs

2012

2.8 Zettabytes

2020

40 Zettabytes

LAGGARDS

INDUSTRY

LEADERS

1

2 New Data

ERP CRM SCM

New

Traditional

Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Big Data & Hadoop Market Drivers and Opportunities

Business Drivers

• From reactive analytics

to proactive customer

interaction

• Insights that drive

competitive advantage

& optimal returns

Financial Drivers

• Cost of data systems,

as % of IT spend,

continues to grow

• Cost advantages of

commodity hardware

& open source software

$

Technical Drivers

• Data is growing

exponentially & existing

systems overwhelmed

• Predominantly driven by

NEW types of data that

can inform analytics

Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

..to shift from reactive to proactive interactions

HDP and Hadoop allow

organizations to shift

interactions from…

Reactive Post Transaction

Proactive Pre Decision

…to Real-time Personalization From static branding

…to repair before break From break then fix

…to Designer Medicine From mass treatment

…to Automated Algorithms From Educated Investing

…to 1x1 Targeting From mass branding

A shift in Advertising

A shift in Financial Services

A shift in Healthcare

A shift in Retail

A shift in Telco

Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Existing silos under pressure from new data sources A

PP

LIC

ATI

ON

S D

ATA

SY

STEM

SO

UR

CES

Business Analytics

Custom Applications

Packaged Applications

Existing Sources (CRM, ERP, Clickstream, Logs)

SILO SILO

RDBMS

SILO SILO SILO SILO

EDW MPP

Data growth: New Data Types

OLTP, ERP, CRM Systems

Unstructured docs, emails

Clickstream

Server logs

Social/Web Data

Sensor. Machine Data

Geolocation

85% Source: IDC

??

Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Enterprise Goals for the Modern Data Architecture

• Consolidate siloed data sets; structured

and unstructured

• Provide single view of the customer,

product, supply chain

• Serve batch, interactive and real time

applications on shared datasets

• Central data set on a single cluster

• Central services for security, governance

and operation

• Preserve existing investment in current

tools and platforms

AP

PL

ICA

TIO

NS

D

AT

A

SY

ST

EM

Business

Analytics

Custom

Applications

Packaged

Applications

RDBMS

EDW

MPP

YARN: Data Operating System

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° N

Interactive Real-Time Batch CRM

ERP

Other 1 ° ° °

° ° ° °

HDFS (Hadoop Distributed File System)

SO

UR

CE

S

EXISTING Systems

Clickstream Web &Social

Geolocation Sensor & Machine

Server Logs

Unstructured

Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Traditional Hadoop, challenges & limitations

1 ° ° ° ° °

° ° ° ° ° N

HDFS (Hadoop Distributed File System)

MapReduce

Largely Batch Processing

SO

UR

CE

S

EXISTING Systems

Clickstream Web &Social Geolocation Sensor & Machine

Server Logs Unstructured

Architectural Limitations

• Primarily a batch system using MapReduce

• Single purpose clusters, specific data sets

Enterprise Challenges

• Limited enterprise capabilities:

Operations, Security & Governance

• Created additional Silos

Interoperability Challenges

• Difficult to natively integrate existing applications

AP

PL

ICA

TIO

NS

D

AT

A

SY

ST

EM

Business

Analytics

Custom

Applications

Packaged

Applications

RDBMS EDW MPP

Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

YARN and HDP Enables the Modern Data Architecture

HDP Hortonworks Data Platform

Provision,

Manage &

Monitor

Ambari

Zookeeper

Scheduling

Oozie

Data Workflow,

Lifecycle &

Governance

Falcon

Sqoop

Flume

NFS

WebHDFS

YARN: Data Operating System

DATA MANAGEMENT

SECURITY BATCH, INTERACTIVE & REAL-TIME

DATA ACCESS

GOVERNANCE

& INTEGRATION

Authentication

Authorization

Accounting

Data Protection

Storage: HDFS

Resources: YARN

Access: Hive, …

Pipeline: Falcon

Cluster: Knox

OPERATIONS

Script

Pig

Search

Solr

SQL

Hive

HCatalog

NoSQL

HBase

Accumulo

Stream

Storm

Other

ISVs

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° °

°

°

N

HDFS (Hadoop Distributed File System)

In-Memory

Spark

YARN is the architectural center of

Hadoop and HDP

• YARN enables a common data set

across all applications

• Batch, interactive & real-time

workloads

• Support multi-tenant access &

processing

HDP enables Apache Hadoop to

become Enterprise Viable Data

Platform with centralized services

• Security

• Governance

• Operations

• Productization

Enabled broad ecosystem

adoption

Tez Tez

Hortonworks drove this innovation of Hadoop through YARN

Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Key Drivers of Hadoop

OPERATIONS TOOLS

Provision,

Manage &

Monitor

DEV & DATA TOOLS

Build &

Test

DA

TA S

YST

EM

REPOSITORIES

SOU

RC

ES

RDBMS EDW MPP

AP

PLI

CA

TIO

NS

Business Analytics

Custom Applications

Packaged Applications

Unlock New Approach to Analytics

• Agile analytics via “Schema on Read” with ability to store all data in native format

• Create new apps from new types of data

A

Optimize Investments, Cut Costs

• Focus EDW on high value workloads

• Use commodity servers & storage to enable all data (original and historical) to be accessible for ongoing exploration

B

Enable a Modern Data Architecture

• Integrate new & existing data sets

• Make all data available for shared access and processing in multitenant infrastructure

• Batch, interactive & real-time use cases

• Integrated with existing tools & skills

C

EXISTING Systems

Clickstream Web & Social

Geolocation Sensor & Machine

Server Logs

Unstructured

YARN: Data Operating System

° ° ° ° ° ° ° ° °

Interactive Real-Time Batch

HDFS: Hadoop Distributed File System

Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Create New Applications from New Types of Data

INDUSTRY USE CASE Sentiment

& Web

Clickstream

& Behavior

Machine

& Sensor Geographic Server Logs

Structured &

Unstructured

Financial Services

New Account Risk Screens ✔ ✔

Trading Risk ✔ ✔

Insurance Underwriting ✔ ✔ ✔

Telecom

Call Detail Records (CDR) ✔ ✔

Infrastructure Investment ✔ ✔

Real-time Bandwidth Allocation ✔ ✔ ✔ ✔ ✔

Retail

360° View of the Customer ✔ ✔ ✔

Localized, Personalized Promotions ✔

Website Optimization ✔

Manufacturing

Supply Chain and Logistics ✔

Assembly Line Quality Assurance ✔

Crowd-sourced Quality Assurance ✔

Healthcare Use Genomic Data in Medical Trials ✔ ✔

Monitor Patient Vitals in Real-Time ✔ ✔

Pharmaceuticals Recruit and Retain Patients for Drug Trials ✔ ✔

Improve Prescription Adherence ✔ ✔ ✔ ✔

Oil & Gas Unify Exploration & Production Data ✔ ✔ ✔ ✔

Monitor Rig Safety in Real-Time ✔ ✔ ✔

Government ETL Offload/Federal Budgetary Pressures ✔ ✔

Sentiment Analysis for Government Programs ✔

Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

YARN: Traditional to Modern Hadoop

Owen O’Malley – Founder, Hortonworks

Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

1 ° ° ° ° °

° ° ° ° ° N

HDFS (Hadoop Distributed File System)

MapReduce Largely Batch Processing

2006

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Traditional Hadoop Traditional Hadoop allowed early adopters

to deal with data at scale via: • Single purpose clusters, specific data sets

• Primarily batch-oriented applications using MapReduce

However…

• No direct way to integrate interactive and real-time

applications

• Limited enterprise capabilities:

Operations, Security & Governance

In the beginning…

Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

1 ° ° ° ° °

° ° ° ° ° N

HDFS (Hadoop Distributed File System)

MapReduce Largely Batch Processing

2006 JAN 2008

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Traditional Hadoop

MAPREDUCE-279 Outlines a NEW architecture for Hadoop which allows

for efficient use of resources across many types of apps

…with increased adoption and

breadth of use cases,

a new approach was needed

2011

Hortonworks Founded Work accelerates on Hadoop’s

next-gen architecture

Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

2008 2006

1 ° ° ° ° °

° ° ° ° ° N

HDFS (Hadoop Distributed File System)

MapReduce Largely Batch Processing

Traditional Hadoop

MAPREDUCE-279

2011

Enterprise Hadoop Era Begins October 23, 2013

Hadoop 2 & YARN

YARN : Data Operating System

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° °

°

° N

HDFS (Hadoop Distributed File System)

Batch Interactive Real-Time

Core of Enterprise Hadoop

Architected &

led development

of YARN to enable

the Modern Data

Architecture

Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Benefits Enabled by MDA and YARN

SOLUTION: A single set of data across the entire cluster with multiple

access methods using “zones” for processing

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

° ° ° ° ° ° ° n

Interactive Hive

Storm

Real Time Streams

Single Cluster,

Multiple Workloads

• Maximize compute

resources to lower TCO

• No standalone,

siloed clusters

• Simple management

& operations

…all enabled by YARN

Batch Pig

Real Time HBase

Spark

In Memory

Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop Does Interactive & Real-Time

Trucking Company

Use Case

Tom Benton

Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Trucking company w/ large fleet of trucks in Midwest

A truck generates millions of events for

a given route; an event could be:

• 'Normal' events: starting / stopping of the vehicle

• ‘Violation’ events: speeding, excessive

acceleration and breaking, unsafe tail distance

Route?

Truck?

Driver? Analysts query a

broad history to

understand if today’s

violations are part of

a larger problem with

specific routes,

trucks, or drivers

Company uses an application that

monitors truck locations and violations

from the truck/driver in real-time

Page 25 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Truck Sensors

Distributed Storage: HDFS

Many Workloads: YARN

Trucking Company’s YARN-enabled Architecture

Stream Processing (Storm)

Inbound Messaging (Kafka)

Microsoft

Excel

Interactive Query (Hive on Tez)

Alerts & Events (ActiveMQ)

Real-Time

User Interface

Real-time Serving (HBase)

One cluster with consistent

security, governance &

operations

Page 26 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Thank You, Questions

YARN: Data Operating System

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° N

Interactive Real-Time Batch

1 ° ° °

° ° ° °

HDFS (Hadoop Distributed File System)