modern data architecture with enterprise apache hadoop · pdf filemodern data architecture...

24
Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved Modern Data Architecture with Enterprise Apache Hadoop Hortonworks. We do Hadoop. Jeff Markham Technical Director, APAC [email protected]

Upload: lamdan

Post on 16-Mar-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 1 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Modern Data Architecture with Enterprise Apache Hadoop

Hortonworks. We do Hadoop.

Jeff Markham

Technical Director, APAC

[email protected]

Page 2: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 2 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Open

LeadershipDrive innovation in the open exclusively via the Apache community-driven open source process

Enterprise

RigorEngineer, test and certify Apache Hadoop with the enterprise in mind

Ecosystem

EndorsementFocus on deep integration with existing data center technologies and skills

Enable your Modern Data Architecture by delivering Enterprise Apache Hadoop

Our Mission:

Reseller Partners:

Headquartered in Palo Alto, CA; 300+ employees and growing

Page 3: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

A data architecture under pressure from new data

AP

PLI

CA

TIO

NS

DA

TA

SY

ST

EM

REPOSITORIES

SO

UR

CE

S

Existing Sources (CRM, ERP, Clickstream, Logs)

RDBMS EDW MPP

Business

Analytics

Custom

Applications

Packaged

Applications

Source: IDC

2.8 ZB in 2012

85% from New Data Types

15x Machine Data by 2020

40 ZB by 2020

OLTP, ERP, CRM Systems

Unstructured documents, emails

Clickstream

Server logs

Sentiment, Web Data

Sensor. Machine Data

Geolocation

Page 4: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop within an emerging Modern Data Architecture

OPERATIONS TOOLS

Provision, Manage &Monitor

DEV & DATA TOOLS

Build & Test

DA

TA

SY

ST

EM

REPOSITORIES

SO

UR

CE

S

RDBMS EDW MPP

OLTP, ERP,

CRM Systems

Documents,

Emails

Web Logs,

Click Streams

Social

Networks

Machine

Generated

Sensor

Data

Geolocation

Data

Go

ve

rna

nc

e

& In

teg

rati

on

Se

cu

rity

Op

era

tio

nsData Access

Data Management

AP

PLI

CA

TIO

NS

Business

Analytics

Custom

Applications

Packaged

Applications

Page 5: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop: typically used for new analytic applications7

SC

AL

E

SCOPE

New Analytic AppsNew types of dataLOB-driven

Page 6: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

7 and incrementally delivers a ‘Data Lake’

SC

AL

E

SCOPE

A Modern Data Architecture/Data Lake

New Analytic AppsNew types of dataLOB-driven

RDBMS

MPP

EDW

Go

ve

rna

nc

e

& In

teg

rati

on

Se

cu

rity

Op

era

tio

nsData Access

Data Management

Data Lake

An architectural shift in the data center that uses Hadoop to deliver deeper insight across a large, broad, diverse set of data at efficient scale

Page 7: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

ClickstreamCapture and analyze website visitors’ data trails and optimize your website

SensorsDiscover patterns in data streaming automatically from remote sensors and machines

Server LogsResearch logs to diagnose process failures and prevent security breaches

New types of dataHadoop Value:

SentimentUnderstand how your customers feel about your brand and products –right now

GeographicAnalyze location-based data to manage operations where they occur

UnstructuredUnderstand patterns in files across millions of web pages, emails, and documents

Page 8: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop unlocks a new approach: Iterative Analytics

HadoopMultiple Query Engines

Iterative Process: Explore, Transform, Analyze

SQLSingle Query Engine

Repeatable Linear Process

Determine

list of

questions

Design

solutions

Collect

structured

data

Ask

questions

from list

Detect

additional

questions

Batch Interactive Real-time Streaming

Current Reality

Apply schema on write

Dependent on IT

Augment w/ Hadoop

Apply schema on read

Support range of access patterns to data stored in HDFS: polymorphic access

Page 9: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

New analytic applications for new types of data

$

• Supplier Consolidation

• Supply Chain and Logistics

• Assembly Line Quality Assurance

• Proactive Maintenance

• Crowdsourced Quality Assurance

• New Account Risk Screens

• Fraud Prevention

• Trading Risk

• Maximize Deposit Spread

• Insurance Underwriting

• Accelerate Loan Processing

• Call Detail Records (CDRs)

• Infrastructure Investment

• Next Product to Buy (NPTB)

• Real-time Bandwidth Allocation

• New Product Development

• 360° View of the Customer

• Analyze Brand Sentiment

• Localized, Personalized Promotions

• Website Optimization

• Optimal Store Layout

Financial

Services

Retail Telecom Manufacturing

HealthcareUtilities,

Oil & GasPublic

Sector

• Genomic data for medical trials

• Monitor patient vitals

• Reduce re-admittance rates

• Store medical research data

• Recruit cohorts for pharmaceutical trials

• Smart meter stream analysis

• Slow oil well decline curves

• Optimize lease bidding

• Compliance reporting

• Proactive equipment repair

• Seismic image processing

• Analyze public sentiment

• Protect critical networks

• Prevent fraud and waste

• Crowdsource reporting for repairs to infrastructure

• Fulfill open records requests

Page 10: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop delivers compelling economics

EDW Optimization

OPERATIONS

50%

ANALYTICS

20%

ETL PROCESS

30%

OPERATIONS

50% ANALYTICS

50%

Current Reality

EDW at capacity: some usage from low value workloads

Older data archived, unavailable for ongoing exploration

Source data often discarded

Augment w/ Hadoop

Free up EDW resources from low value tasks

Keep 100% of source data and historical data for ongoing exploration

Mine data for value after loading it because of schema-on-read

MPP

SAN

Engineered System

NAS

HADOOP

Cloud Storage

$0 $20,000 $40,000 $60,000 $80,000 $180,000

Fully-loaded Cost Per Raw TB

of Data (Min–Max Cost)

Commodity Compute & Storage

Hadoop Enables Scalable Compute & Storage at a Compelling Cost Structure

Page 11: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Delivering the Core Capabilities of Enterprise Hadoop

Page 12: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 12 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Core Capabilities of Enterprise Hadoop

Load data and manage

according to policy

Deploy and effectively

manage the platform

Store and process all of your Corporate Data Assets

Access your data simultaneously in multiple ways(batch, interactive, real-time) Provide layered

approach tosecurity through Authentication, Authorization,

Accounting, and Data Protection

DATA MANAGEMENT

SECURITYDATA ACCESSGOVERNANCE &

INTEGRATIONOPERATIONS

Enable both existing and new application toprovide value to the organization

PRESENTATION & APPLICATION

Empower existing operations and security tools to manage Hadoop

ENTERPRISE MGMT & SECURITY

Provide deployment choice across physical, virtual, cloud

DEPLOYMENT OPTIONS

Page 13: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 13 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Core Technologies of Enterprise Hadoop

Provision, Manage &

Monitor

Ambari

Zookeeper

Scheduling

Oozie

Data Workflow, Lifecycle

& Governance

Falcon

Sqoop

Flume

NFS

WebHDFS

YARN : Data Operating System

DATA MANAGEMENT

SECURITYDATA ACCESSGOVERNANCE &

INTEGRATION

Authentication

Authorization

Accounting

Data Protection

Storage: HDFS

Resources: YARN

Access: Hive, …

Pipeline: Falcon

Cluster: Knox

OPERATIONS

Script

Pig

Search

Solr

SQL

Hive/Tez,

HCatalog

NoSQL

HBase

Accumulo

Stream

Storm

Others

In-Memory

Analytics,

ISV engines

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° °

°

°

N

HDFS (Hadoop Distributed File System)

Batch

Map Reduce

Page 14: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 14 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hortonworks provides leadership to Hadoop

614,041

449,768

147,933

Total Net Lines Contributed to Apache Hadoop

End Users

25

10 Yahoo7 Cloudera

5 Facebook

3 IBM

3 LinkedIn

10 Others

Total Number of Committersto Apache Hadoop

Apache

ProjectCommitters

PMC

Members

Hadoop 21 13

Tez 10 4

Hive 15 3

HBase 8 3

Pig 6 5

Sqoop 1 0

Ambari 21 12

Knox 6 2

Falcon 4 2

Oozie 2 2

Zookeeper 2 1

Flume 1 0

Accumulo 2 2

Storm 1 0

TOTAL 103 49

Page 15: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 15 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

The Partners You Rely On, Rely On Hortonworks for Hadoop

Page 16: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 16 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop is wholly integrated into the data center

AP

PLI

CA

TIO

NS

DA

TA

SY

ST

EM

SO

UR

CE

S

RDBMS EDW MPP

Emerging Sources (Sensor, Sentiment, Geo, Unstructured)

HANA

OPERATIONAL TOOLS

DEV & DATA TOOLS

Existing Sources (CRM, ERP, Clickstream, Logs)

INFRASTRUCTURE

HDP 2.1

Go

ve

rna

nc

e

& In

teg

rati

on

Se

cu

rity

Op

era

tio

nsData Access

Data Management

DEPTHHortonworks engages in deep engineered relationships with the leaders in the data center, such as Microsoft, Teradata, Redhat & SAP

BREADTHHundreds of partners work with us to certify their applications to work with Hadoop so they can extend big data to their users

Page 17: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 17 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Financial Services High Level Use Cases

Page 18: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 18 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Telco High Level Use Cases

Page 19: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 19 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hortonworks Data Platform

So

lr

Had

oo

p

&Y

AR

N

Pig

Tez

Hiv

e &

HC

ata

log

HB

ase

Sq

oo

p

Oo

zie

Zo

okeep

er

Mah

ou

t

Am

bari

Sto

rm

Flu

me

Kn

ox

Ph

oen

ix

Accu

mu

lo

HDP 2.1: Reliable, Consistent & Current

HDP certifies most recent & stable community innovation

2.2.0

1.1.2

0.11.0

0.11.0

0.12.0

0.12.0

HDP 1.3

May

2013

2.4.0 0.12.1

HDP 2.0

October

2013

HDP 2.1

April

2014

SecurityOperationsData AccessData

Management

0.13.0

0.94.6

0.96.1

0.98.0

0.9.1

0.7.0

0.8.0

0.9.04.7.2

1.4.3

1.4.4

1.3.1

1.4.0

1.2.5

1.4.4

1.5.1

3.3.2

4.0.0

3.4.5

0.4.0

0.4.04.0.0

1.5.1

Falc

on

0.5.0

Governance

& Integration

Page 20: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 20 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Stable Project Releases

Fixed Issues

Hortonworks process for Enterprise Hadoop

Upstream Community Projects Downstream Enterprise Product

Certified at scale using the most advanced

Hadoop test bed on the planet

• 1000’s of production nodes at Yahoo!• Over 1500 unit & system tests

HDP 2.1

Distribute

Integrate & Test

Package & Certify

ReleaseApache

Hadoop

Test &Patch

Design & Develop

Virtuous cycle when development & fixed issues done

upstream & stable project releases flow downstream

Design & Develop

Apache

HiveApache

HBase

Apache

Pig

Apache

Falcon

Apache

KnoxApache

Storm

Page 21: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 21 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Working in the community for the enterprise

Hortonworks de-risks your investmentE

All support and implementation backed by the largest and most experienced Hadoop team on the planet

A Two Way Street

� Gain access to the expertise and knowledge of the Hadoop community

� All issues or feature requests represented in the community

Apache Community Leadership

Support

Organization

Implementation

Team

Hortonworks Reseller, System Integrator,

Technology, and Training Partners

Engineering Team

Page 22: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hortonworks: A Leader In Hadoop

The Forrester Wave™: Big Data Hadoop Solutions, Q1 2014

“Hortonworks loves and lives

open source innovation”

Vision & Execution for Enterprise Hadoop.Hortonworks leads with a strong strategy and roadmap for open source innovation with Hadoop and a strong delivery of that innovation in Hortonworks Data Platform.

World Class Support and Services.Hortonworks' Customer Support received a maximum score and was significantly higher than both Cloudera and MapR.

Key Strategic Partnerships. Hortonworks unique strategic partnerships with Microsoft, SAP, Teradata and others are a key strength as part of its overall strategy of ecosystem partnership to accelerate Hadoop adoption in the enterprise.

Page 23: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 23 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Avoid Vendor

Lock In

Hortonworks Data Platform is close to the open source trunk as possible and is developed 100% in the open so you

are never locked in

Partnerships

that Matter

We work with partners to deeply integrate Hadoop with data center technologies so you can leverage existing skills and investments

Support from

the Experts

We provide the highest quality of support for deploying at scale. You

are supported by hundreds of years of Hadoop experience

The value of Open7

Connect With

the Community

We employ a large number of Apache project committers & innovators so that you

are represented in the open source community

Certified for

the Enterprise

We engineer, test and certify the Hortonworks Data Platform at scale to ensure reliability and stability you require for enterprise use

Page 24: Modern Data Architecture with Enterprise Apache Hadoop · PDF fileModern Data Architecture with Enterprise Apache Hadoop Hortonworks. ... Redhat& SAP BREADTH Hundreds of ... Modern

Page 24 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Thank You

Jeff Markham

Technical Director, APAC

[email protected]