starting small and scaling big with hadoop (talend and hortonworks webinar)) final

30
© Talend 2014 1 Starting Small and Scaling Big with Hadoop November 20, 2014

Upload: hortonworks

Post on 02-Jul-2015

1.617 views

Category:

Technology


0 download

DESCRIPTION

No matter if you are new to Hadoop or have a mature cluster in production, scale will be a critical factor of your success with Hadoop. Are you ready to take the next big step as you scale out your data architecture? Talend and Hortonworks discuss where we will help you learn how to implement an effective big data and Hadoop strategy across your IT infrastructure. You will learn: How to grow a pilot into production How to scale-out architecture & systems affordably How to leverage the flexibility of Hadoop to optimize your data integration processes Recording: http://www.talend.com/resources/webinars/starting-small-and-scaling-big-with-hadoop

TRANSCRIPT

Page 1: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

© Talend 2014 1

Starting Small and

Scaling Big with Hadoop

November 20, 2014

Page 2: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

© Talend 2014 2

Your Speakers Today

Jim Walker Director, Product Marketing

Julien Sauvage Director, Product Marketing

Page 3: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

Page 3 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

2013 Digital universe

2.3 Zettabytes

1 Zettabyte (ZB) = 1 million Petabytes (PB); Sources: IDC and IDG Enterprise

85% of growth from new types of data

with machine-generated data increasing

15x

2020 Digital universe

40 Zettabytes

& Hadoop Market $50B

Analysts consensus estimates

enterprise data growth of

year over year through 2020

50x

Page 4: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

Page 4 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

A shift from reactive to proactive interactions

HDP and Hadoop allow

organizations to shift

interactions from…

Reactive Post Transaction

Proactive Pre Decision

…to Real-time Personalization From static branding

…to repair before break From break then fix

…to Designer Medicine From mass treatment

…to Automated Algorithms From Educated Investing

…to 1x1 Targeting From mass branding

A shift in Advertising

A shift in Financial Services

A shift in Healthcare

A shift in Retail

A shift in Telco

Page 5: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

Page 5 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HDP Realized Cost Savings with EDW Optimization

Archive Data away from EDW

• Move cold or rarely used data to Hadoop

as active archive

• Store more of data longer

Offload costly ETL process

• Free your EDW to perform high-value

functions like analytics & operations, not ETL.

• Use Hadoop for advanced ETL

Optimize the value of your EDW

• Use Hadoop to refine new data sources, such

as web and machine data for new analytical

context

AN

AL

YT

ICS

D

AT

A S

YS

TE

MS

Data

Marts

Business

Analytics

Visualization

& Dashboards

Systems of

Record

RDBMS

ERP

CRM

Other

Clickstream Web & Social Geolocation Sensor & Machine

Server Logs

Unstructured NE

W

SO

UR

CE

S

HDP 2.2

ELT

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

°

N

Cold Data,

Deeper Archive

& New Sources

Enterprise Data

Warehouse

Hot

Hadoop Helps you optimize and reduce costs associated with your EDW

Page 6: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

Page 6 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Realize dramatic savings for cost of storage

Cost Efficiencies

Reduce costs associated with

expensive archive systems

• Utilize existing relationships with

hardware vendors

• Open Source Software

Active Archive

Provide access to archived data not

just collect dust

MPP

SAN

Engineered System

NAS

HADOOP

Cloud Storage

$0 $20,000 $40,000 $60,000 $80,000 $180,000

Fully-loaded Cost Per Raw TB of Data (Min–Max Cost)

Hadoop Enables Scalable Compute & Storage at a Compelling Cost Structure

Storage Costs/Compute Costs

from $19/GB to $0.23/GB

Page 7: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

Page 7 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Unlock New Applications from New Types of Data

INDUSTRY USE CASE Sentiment

& Web

Clickstream

& Behavior

Machine

& Sensor Geographic Server Logs

Structured &

Unstructured

Financial Services

New Account Risk Screens ✔ ✔

Trading Risk ✔

Insurance Underwriting ✔ ✔ ✔

Telecom

Call Detail Records (CDR) ✔ ✔

Infrastructure Investment ✔ ✔

Real-time Bandwidth Allocation ✔ ✔ ✔

Retail

360° View of the Customer ✔ ✔ ✔

Localized, Personalized Promotions ✔

Website Optimization ✔

Manufacturing

Supply Chain and Logistics ✔

Assembly Line Quality Assurance ✔

Crowd-sourced Quality Assurance ✔

Healthcare Use Genomic Data in Medial Trials ✔ ✔ ✔

Monitor Patient Vitals in Real-Time

Pharmaceuticals Recruit and Retain Patients for Drug Trials ✔ ✔

Improve Prescription Adherence ✔ ✔ ✔ ✔

Oil & Gas Unify Exploration & Production Data ✔ ✔ ✔ ✔

Monitor Rig Safety in Real-Time ✔ ✔ ✔

Government ETL Offload/Federal Budgetary Pressures ✔ ✔

Sentiment Analysis for Government Programs ✔

Page 8: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

Page 8 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

End Game: Data Lake - An architectural shift S

CA

LE

SCOPE

Unlocking the Data Lake

RDBMS

MPP

EDW

Data Lake Enabled by YARN

• Single data repository,

shared infrastructure

• Multiple biz apps

accessing all the data

• Enable a shift from

reactive to proactive

interactions

• Gain new insight across

the entire enterprise

New Analytic Apps

or IT Optimization

HDP 2.1

Go

ve

rna

nc

e

& I

nte

gra

tio

n

Se

cu

rity

Op

era

tio

ns

Data Access

Data Management

YARN

Page 9: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

Page 9 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Enterprise Goals for the Modern Data Architecture

• Consolidate siloed data sets structured

and unstructured

• Central data set on a single cluster

• Multiple workloads across batch

interactive and real time

• Central services for security, governance

and operation

• Preserve existing investment in current

tools and platforms

• Single view of the customer, product,

supply chain

AP

PL

ICA

TIO

NS

D

AT

A

SY

ST

EM

Business

Analytics

Custom

Applications

Packaged

Applications

RDBMS

EDW

MPP

YARN: Data Operating System

1 ° ° ° ° ° ° ° ° °

° ° ° ° ° ° ° ° ° N

Interactive Real-Time Batch CRM

ERP

Other 1 ° ° °

° ° ° °

HDFS (Hadoop Distributed File System)

SO

UR

CE

S

EXISTING Systems

Clickstream Web &Social

Geolocation Sensor & Machine

Server Logs

Unstructured

Page 10: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

Page 10 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

HDP delivers a comprehensive data management platform

Hortonworks Data Platform 2.2

YARN: Data Operating System

(Cluster Resource Management)

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Script

Pig

SQL

Hive

Tez Tez

Java

Scala

Cascading

Tez

° °

° °

° ° ° ° °

° ° ° ° °

Others

ISV

Engines

HDFS (Hadoop Distributed File System)

Stream

Storm

Search

Solr

NoSQL

HBase

Accumulo

Slider Slider

SECURITY GOVERNANCE OPERATIONS BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

In-Memory

Spark

Provision,

Manage &

Monitor

Ambari

Zookeeper

Scheduling

Oozie

Data Workflow,

Lifecycle &

Governance

Falcon

Sqoop

Flume

Kafka

NFS

WebHDFS

Authentication

Authorization

Accounting

Data Protection

Storage: HDFS

Resources: YARN

Access: Hive, …

Pipeline: Falcon

Cluster: Knox

Cluster: Ranger

Deployment Choice Linux Windows On-Premises Cloud

YARN is the architectural

center of HDP

Enables batch, interactive

and real-time workloads

Provides comprehensive

enterprise capabilities

The widest range of

deployment options

Delivered Completely in the OPEN

Page 11: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

Page 11 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

OPERATIONAL TOOLS

DEV & DATA TOOLS

INFRASTRUCTURE

HDP and Talend in the Modern Data Architecture S

OU

RC

ES

EXISTING Systems

Clickstream Web &Social Geolocation Sensor & Machine

Server Logs Unstructured

DA

TA

S

YS

TE

M

RDBMS EDW

AP

PLI

CA

TIO

NS

BusinessObjects BI

HDP 2.1

Go

ve

rna

nc

e

& In

teg

rati

on

Se

cu

rity

Op

era

tio

ns

Data Access

Data Management

YARN

Hadoop 2.0, YARN

Data Quality

Pig, Hive,

ETL ELT

HBase, NoSQL

Deep Partnerships Hortonworks engages

in deep engineered relationships

with the leaders in the data center,

such as Microsoft, Teradata, Redhat,

HP, SAS & SAP

Broad Partnerships Over 600 partners work with us to

certify their applications to work with

Hadoop so they can extend big data

to their users

Page 12: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

© Talend 2014 12

Connecting the Data-Driven Enterprise

Page 13: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

© Talend 2014 13

The Talend Platform

Page 14: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

© Talend 2014 14

Still Hand-Coding Data Integration?

Hand-coding Talend Enterprise

• Unproductive

• Need specialized skills

• Hard to maintain

• Limited support

• 800+ drag-n-drop components

• Generates optimized code

• Collaboration & management

• Gold support (SLAs)

Page 15: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

© Talend 2014 15

Encumbered with Legacy ETL?

Legacy ETL Talend Enterprise

• Proprietary engine

• Hard to scale Big Data

• Expensive

• Open

• Generates native code

• Low TCO

Page 16: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

© Talend 2014 16

Next big

thing

SQL

ELT

DW appliance

Future-Proof Architecture

ETL

Day-to-day

integration

JAVA

Hadoop

Highly Scalable

MapReduce

CAMEL

Message

transform-ation

CAMEL

Page 17: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

© Talend 2014 17

ONE cluster to deploy

ONE cluster to manage

ONE cluster to monitor

ONE cluster to scale ONE cluster to update

ONE cluster to pay for!

And it will be 100x faster in 2 years

Infinite Scale

Page 18: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

© Talend 2014 18

Unlock New Applications from New Types of Data

INDUSTRY USE CASE Sentiment & Web

Clickstream & Behavior

Machine & Sensor

Geographic Server Logs Structured & Unstructured

Financial Services

New Account Risk Screens ✔ ✔

Trading Risk ✔

Insurance Underwriting ✔ ✔ ✔

Telecom

Call Detail Records (CDR) ✔ ✔

Infrastructure Investment ✔ ✔

Real-time Bandwidth Allocation ✔ ✔ ✔

Retail

360° View of the Customer ✔ ✔ ✔

Localized, Personalized Promotions ✔

Website Optimization ✔

Manufacturing

Supply Chain and Logistics ✔

Assembly Line Quality Assurance ✔

Crowd-sourced Quality Assurance ✔

Healthcare Use Genomic Data in Medial Trials ✔ ✔ ✔

Monitor Patient Vitals in Real-Time

Pharmaceuticals Recruit and Retain Patients for Drug Trials ✔ ✔

Improve Prescription Adherence ✔ ✔ ✔ ✔

Oil & Gas Unify Exploration & Production Data ✔ ✔ ✔ ✔

Monitor Rig Safety in Real-Time ✔ ✔ ✔

Government ETL Offload/Federal Budgetary Pressures ✔ ✔

Sentiment Analysis for Government Programs ✔

integration

jobs + ++

Page 19: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

© Talend 2014 19

100x performance increase

< 1 sec response

Address new use cases

(last minute defense, dynamic pricing, real-time

fraud detection, etc.)

Simplify Real-Time Big Data

New components for streaming data

Page 20: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

© Talend 2014 20

The Talend Solution

Scalable

• Generates native code

• Future-proof

• Built-in data quality

• More productive

• Open source

• Innovative

Agile

• Open source platform

• Learn once

• Expand many times

Easy

• Subscription pricing

• Per developer

• Predictable cost

Lowest TCO

The ease of use of the Talend platform allows us to deliver

Page 21: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

© Talend 2014 21

The Three Drivers of Success

Product Innovation Market Adoption Industry Recognition

Customers

Community

Partners

“Visionary”

“Leader”

Multi-award winner

Big Data

Cloud

Page 22: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

Page 22 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Customer Case Study

Product Inventory and Pricing

Page 23: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

© Talend 2014 23

The Old Way to Do Forecasting

Product category “HALLOWEEN”

Page 24: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

© Talend 2014 24

Data Explosion in Size

Multiple SKUs Multiple stores

Product 2

Product 1

Product 3

Halloween mask

Halloween candies

Pumpkin

10,000’s 1, 000’s X

Page 25: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

© Talend 2014 25

Need for a Modern Architecture

data at rest

DAO

Cassandra OLTP

Hadoop EDW

data in motion

BI

Viz &

Analytics

Graphical Generates code Runs on Hadoop

Page 26: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

© Talend 2014 26

A New EDW “Eco” System

Enterprise

Intelligence &

Advanced Analytics

SSAS

Enterprise Data

Warehouse

Advanced Analytics Platform

or

Data

Refinery & Ingest Engine

Fast Data Cache

Page 27: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

Page 27 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

Talend + Hortonworks = Open = Awesome!

• Pure open source governed cluster

• Don’t need to recode or reformat data

• No vendor lock-in

• Subscription models

• Most recent releases of Apache projects

• We are always aligned and up to date

Page 28: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

Page 28 © Hortonworks Inc. 2011 – 2014. All Rights Reserved

The Forrester Wave™

Big Data Hadoop Solutions

Q1 2014

“Hortonworks loves and lives

open source innovation”

World Class Support and Services.

Hortonworks' Customer Support received a

maximum score and was significantly higher than

both Cloudera and MapR

A Leader in Hadoop

Page 29: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

© Talend 2014 29

Questions?

Jim Walker @jaymce

Julien Sauvage @sauvageju

Page 30: Starting Small and Scaling Big with Hadoop (Talend and Hortonworks webinar)) final

© Talend 2014 30

Check Out Our Talend + Hortonworks Sandbox!

http://www.talend.com